Scaling does not create new capacity. It only makes the model you already have more likely to converge if it can, or more likely to reveal its ceiling if it cannot.
This is a small warmup, but it demonstrates the principle cleanly.
The Warmup Setup
Two local experiments, no framework, CPU-only:
- A single perceptron on logic gates (
AND,OR,NAND,XOR). - A minimal two-layer network on
XORonly.
The goal was not to build a benchmark. The goal was to check a simple structural limit with real runs.
Result 1: Single Perceptron
Repeated trials confirm the classic limit:
ANDconverges every time.ORconverges every time.NANDconverges every time.XORconverges zero times.
This is not a time problem. It is a model-class problem: one linear boundary cannot separate XOR.
Evidence: repeat-results.json
Result 2: Two-Layer Net
The smallest structural change flips the result:
- A tiny two-layer net now can learn
XOR. - In a stable run it converges
197/200times.
That change did not come from more time. It came from a change in structure.
Evidence: repeat-results-stable.json
The Real Takeaway
Scale raises confidence, not capacity.
If the structure cannot represent the function, more compute does not help. More runs only make the limit more obvious.
If the structure can represent it, scale makes convergence more reliable and the solution appears more often and faster.
Why This Matters
This is a small experiment, but it points to a practical rule:
Scaling is not a substitute for the right architecture.
It can confirm success. It cannot create it.
Appendix: Data + Code
— Dennis Hedegreen, trying to see the structure