A CPU Running on GPU
The nCPU project pushes the boundaries of useful absurdity: it's a complete processor where every component is implemented as a neural network running on GPU. Registers, memory, flags, program counter — everything is a PyTorch tensor.
How It Works
Each ALU operation is a trained model:
- Addition: uses Kogge-Stone carry-lookahead algorithm, implemented via 8 neural passes
- Multiplication: learned lookup table for byte pairs
- Logical operations: vectorized neural truth tables
- Shifts: attention-based bit routing
The result? 100% accuracy on integer arithmetic, verified by 347 automated tests.
Performance Inversion
The most counter-intuitive finding: multiplication is 12x faster than addition.
In a classic CPU, MUL is always slower than ADD. Here, it's reversed. The lookup table for MUL (21 µs) has no sequential dependency, while the carry-lookahead adder (248 µs) requires O(log n) propagation stages.
23 Models, 135 MB
The "CPU" includes 23 trained models totaling 135 MB:
- Arithmetic (ADD/SUB/CMP): 100% accuracy
- Multiplication: 100% accuracy
- Logic (AND/OR/XOR): 100% accuracy
- Shifts: 100% accuracy
- Math functions (sin, cos, sqrt, exp, log): trained
Overall Performance
- ~262 µs per instruction cycle
- ~4,975 instructions per second
- Model loading in 60ms
This is obviously orders of magnitude slower than a real CPU. Performance isn't the point.
Why It's Interesting
This project is a materialized thought experiment. It demonstrates that neural networks can implement any computable function — including a complete processor.
It's also an exploration of boundaries between hardware and software, between deterministic and learned computation. The code is available on GitHub for those who want to explore this strange intersection.
