Lottery Ticket Hypothesis

Published on
Last updated on
Technical
AI

神经网络彩票假说

How AI researchers accidentally discovered that everything they thought about learning was wrong

The answer emerged from an unexpected corner: a study of neural network lottery tickets. In 2018, Jonathan Frankle and Michael Carbin at MIT were investigating pruning—removing unnecessary weights after training. Their discovery would provide the elegant solution to the scaling paradox.

Hidden within every large network, they found “winning tickets”—tiny subnetworks that could match the full network’s performance. They could strip away 96% of parameters without losing accuracy. The vast majority of every successful network was essentially dead weight.

But here lay the crucial insight: these winning subnetworks only succeeded with their original random starting weights. Change the initial values, and the same sparse architecture failed completely.

The lottery ticket hypothesis crystallised: large networks succeed not by learning complex solutions, but by providing more opportunities to find simple ones. Every subset of weights represents a different lottery ticket—a potential elegant solution with random initialisation. Most tickets lose, but with billions of tickets, winning becomes inevitable.

During training, the network doesn’t search for the perfect architecture. It already contains countless small networks, each with different starting conditions. Training becomes a massive lottery draw, with the best-initialised small network emerging victorious whilst billions of others fade away.

This revelation reconciled empirical success with classical theory. Large models weren’t memorising—they were finding elegantly simple solutions hidden in vast parameter spaces. Occam’s razor survived intact: the simplest explanation remained best. Scale had simply become a more sophisticated tool for finding those simple explanations.

2018年,麻省理工学院的乔纳森·弗兰克尔(Jonathan Frankle)和迈克尔·卡宾(Michael Carbin)正在研究“剪枝”——即在训练后移除不必要的权重。他们的发现为这个规模悖论提供了一个优雅的解决方案。他们发现,在每一个大型网络内部,都隐藏着“中奖彩票”——一些微小的子网络,它们能够达到完整网络的性能。他们可以剔除掉96%的参数而丝毫不损失准确率。每一个成功的大型网络,其绝大部分实际上都是“累赘”。

但关键的洞见在于:这些“中奖”的子网络,只有在它们原始的随机初始权重下才能成功。一旦改变了这些初始值,同样的稀疏架构就会彻底失败。

彩票假说(The lottery ticket hypothesis)由此成型:大型网络之所以成功,不是因为它们学习了复杂的解决方案,而是因为它们为寻找简单的解决方案提供了更多机会。权重的每一个子集都代表一张不同的彩票——一个拥有随机初始化的、潜在的优雅解决方案。大多数彩票都会“刮空”,但当你有数十亿张彩票时,中奖就变得不可避免。

在训练过程中,网络并不是在寻找完美的架构。它内部已经包含了无数个拥有不同初始条件的小网络。训练变成了一场大规模的彩票开奖,那个初始化得最好的小网络最终胜出,而其他数十亿个则逐渐消亡。

这一启示让经验上的成功与经典理论和解了。大型模型并非在死记硬背,而是在广阔的参数空间中,寻找那些被隐藏起来的、优雅而简单的解决方案。奥卡姆剃刀原则(Occam’s razor)依然有效:最简单的解释仍然是最好的。规模,只是变成了一种更复杂的工具,用来寻找那些简单的解释。

The implications transcend artificial intelligence. If learning means finding the simplest model that explains data, and larger search spaces enable simpler solutions, this reframes intelligence itself.

Consider your brain: 86 billion neurons, trillions of connections, massively overparameterised by any measure. Yet you excel at learning from limited examples and generalising to new situations. The lottery ticket hypothesis suggests this neural abundance serves the same purpose—providing vast numbers of potential simple solutions to any problem.

Intelligence isn’t about memorising information—it’s about finding elegant patterns that explain complex phenomena. Scale provides the computational space needed for this search, not storage for complicated solutions.

这一发现的意义超越了人工智能。如果学习意味着寻找能够解释数据的最简单模型,而更大的搜索空间能促成更简单的解决方案,那么这就重新定义了“智能”本身。

想想你的大脑:860亿个神经元,数万亿个连接,无论从哪个标准看都属于“过参数化”。然而,你却擅长从有限的例子中学习,并泛化到新的情境中。彩票假说表明,这种神经元的丰裕可能服务于同样的目的——为任何问题提供海量的、潜在的简单解决方案。

智能并非关乎记忆信息,而是关乎发现能够解释复杂现象的优雅模式。规模为这种搜索提供了所需的计算空间,而不是为复杂的解决方案提供存储空间。

Back to Blog