Scaling does not create new capacity. It only makes the model you already have more likely to converge if it can, or more likely to reveal its ceiling if it cannot.

This is a small warmup, but it demonstrates the principle cleanly.

The Warmup Setup

Two local experiments, no framework, CPU-only:

  • A single perceptron on logic gates (AND, OR, NAND, XOR).
  • A minimal two-layer network on XOR only.

The goal was not to build a benchmark. The goal was to check a simple structural limit with real runs.

Result 1: Single Perceptron

Repeated trials confirm the classic limit:

  • AND converges every time.
  • OR converges every time.
  • NAND converges every time.
  • XOR converges zero times.

This is not a time problem. It is a model-class problem: one linear boundary cannot separate XOR.

Evidence: repeat-results.json

Result 2: Two-Layer Net

The smallest structural change flips the result:

  • A tiny two-layer net now can learn XOR.
  • In a stable run it converges 197/200 times.

That change did not come from more time. It came from a change in structure.

Evidence: repeat-results-stable.json

The Real Takeaway

Scale raises confidence, not capacity.

If the structure cannot represent the function, more compute does not help. More runs only make the limit more obvious.

If the structure can represent it, scale makes convergence more reliable and the solution appears more often and faster.

Why This Matters

This is a small experiment, but it points to a practical rule:

Scaling is not a substitute for the right architecture.

It can confirm success. It cannot create it.

Appendix: Data + Code

— Dennis Hedegreen, trying to see the structure