> GPT-3 in 2020 makes as good a point as any to take a look back on the past decade. It’s remarkable to reflect that someone who started a PhD because they were excited by these new “ResNets” would still not have finished it by now—that is how recent even resnets are, never mind Transformers, and how rapid the pace of progress is. In 2010, one could easily fit everyone in the world who genuinely believed in deep learning into a moderate-sized conference room (assisted slightly by the fact that 3 of them were busy founding DeepMind).
> In 2010, who would have predicted that over the next 10 years, deep learning would undergo a Cambrian explosion causing a mass extinction of alternative approaches throughout machine learning, that models would scale up to 175,000 million parameters, and that these enormous models would just spontaneously develop all these capabilities?
> No one. That is, no one aside from a few diehard connectionists written off as willfully-deluded old-school fanatics by the rest of the AI community (never mind the world), such as Moravec, Schmidhuber, Sutskever, Legg, & Amodei? One of the more shocking things about looking back is realizing how unsurprising and easily predicted all of this was if you listened to the right people. In 1998, 22 years ago, Moravec noted that AI research could be deceptive... “AI research must wait for the power to become more affordable.”
> Affordable meaning a workstation roughly ~$1,930 ($1,000 in 1998); sufficiently cheap compute to rival a human would arrive sometime in the 2020s, with the 2010s seeing affordable systems in the lizard–mouse range.
> As it happens, the start of the DL revolution is typically dated to AlexNet in 2012, by a grad student using 2 GTX 580 3GB GPUs (launch list price of $683 ($500 in 2010), for a system build cost of perhaps $1,979 ($1,500 in 2012). 2020 saw GPT-3 arrive, and as discussed before, there are many reasons to expect the cost to fall, in addition to the large hardware compute gains that are being forecast for the 2020s despite the general deceleration of Moore’s law
> The accelerating pace of the last 10 years should wake anyone from their dogmatic slumber and make them sit upright. And there are 28 years left in Moravec’s forecast…
> Even in 2015, the scaling hypothesis seemed highly dubious: you needed something to scale, after all, and it was all too easy to look at flaws in existing systems and imagine that they would never go away and progress would sigmoid any month now, soon.
> ...the future arrived at first slowly and then quickly. Yet, here we are: all honor to the fanatics, shame and humiliation to the critics! If only one could go back 10 years, or even 5, to watch every AI researchers’ head explode reading this paper… Unfortunately, few heads appear to be exploding now, because human capacity for hindsight & excuses is boundless
society will have a few waves of disruption for sure. i'm less sure how the 8 quoted paras support the argument. progress is impressive yes, this is not in dispute. but there have also been many AI summers and winters past.
What a daft and pathetic take on Covid numbers. Do you know why the numbers hit an "invisible asymptote"? Because people heeded those warning and a lot took it serious.
Sure some fell for the conspiracy theories, perhaps too many, that it was simultaneously a scam and not a real pandemic and also genetically engineered in Chinese labs to kill us all. Then they waited for the numbers to drop and claimed, "See! It was all fake!".
The same nuance you preach to challenge lies, you don't follow for the anti-Covid conspiracy theorists. Sad.
The simple fact is that trees don't grow to the sky. Exponential growth runs into limits; therefore, growth curves are at BEST sigmoid (and at worst logarithmic).
The problem is that it's REALLY HARD to predict the inflection points of a sigmoid curve from limited data, which was Constance Crozier's point.
I'd say presumption that it'll level out before society is completely disrupted is more shameless.
https://gwern.net/scaling-hypothesis
> GPT-3 in 2020 makes as good a point as any to take a look back on the past decade. It’s remarkable to reflect that someone who started a PhD because they were excited by these new “ResNets” would still not have finished it by now—that is how recent even resnets are, never mind Transformers, and how rapid the pace of progress is. In 2010, one could easily fit everyone in the world who genuinely believed in deep learning into a moderate-sized conference room (assisted slightly by the fact that 3 of them were busy founding DeepMind).
> In 2010, who would have predicted that over the next 10 years, deep learning would undergo a Cambrian explosion causing a mass extinction of alternative approaches throughout machine learning, that models would scale up to 175,000 million parameters, and that these enormous models would just spontaneously develop all these capabilities?
> No one. That is, no one aside from a few diehard connectionists written off as willfully-deluded old-school fanatics by the rest of the AI community (never mind the world), such as Moravec, Schmidhuber, Sutskever, Legg, & Amodei? One of the more shocking things about looking back is realizing how unsurprising and easily predicted all of this was if you listened to the right people. In 1998, 22 years ago, Moravec noted that AI research could be deceptive... “AI research must wait for the power to become more affordable.”
> Affordable meaning a workstation roughly ~$1,930 ($1,000 in 1998); sufficiently cheap compute to rival a human would arrive sometime in the 2020s, with the 2010s seeing affordable systems in the lizard–mouse range.
> As it happens, the start of the DL revolution is typically dated to AlexNet in 2012, by a grad student using 2 GTX 580 3GB GPUs (launch list price of $683 ($500 in 2010), for a system build cost of perhaps $1,979 ($1,500 in 2012). 2020 saw GPT-3 arrive, and as discussed before, there are many reasons to expect the cost to fall, in addition to the large hardware compute gains that are being forecast for the 2020s despite the general deceleration of Moore’s law
> The accelerating pace of the last 10 years should wake anyone from their dogmatic slumber and make them sit upright. And there are 28 years left in Moravec’s forecast…
> Even in 2015, the scaling hypothesis seemed highly dubious: you needed something to scale, after all, and it was all too easy to look at flaws in existing systems and imagine that they would never go away and progress would sigmoid any month now, soon.
> ...the future arrived at first slowly and then quickly. Yet, here we are: all honor to the fanatics, shame and humiliation to the critics! If only one could go back 10 years, or even 5, to watch every AI researchers’ head explode reading this paper… Unfortunately, few heads appear to be exploding now, because human capacity for hindsight & excuses is boundless
society will have a few waves of disruption for sure. i'm less sure how the 8 quoted paras support the argument. progress is impressive yes, this is not in dispute. but there have also been many AI summers and winters past.
I’m gonna steal “foomerism” on my own Substack. This is spot on.
You and swyx are the 2 best "AI" substacks right now.
please do! i did not even invent it (https://twitter.com/search?q=foomerism&src=typed_query) i just like the sound of it
Top quality article, thanks a lot!
Followup: Jason Cohen (of WP Engine fame) writes about the surprising lack of exponentials in businses: https://longform.asmartbear.com/exponential-growth/ and coins the "Elephant Curve"
What a daft and pathetic take on Covid numbers. Do you know why the numbers hit an "invisible asymptote"? Because people heeded those warning and a lot took it serious.
Sure some fell for the conspiracy theories, perhaps too many, that it was simultaneously a scam and not a real pandemic and also genetically engineered in Chinese labs to kill us all. Then they waited for the numbers to drop and claimed, "See! It was all fake!".
The same nuance you preach to challenge lies, you don't follow for the anti-Covid conspiracy theorists. Sad.
Covid, shmovid.
The simple fact is that trees don't grow to the sky. Exponential growth runs into limits; therefore, growth curves are at BEST sigmoid (and at worst logarithmic).
The problem is that it's REALLY HARD to predict the inflection points of a sigmoid curve from limited data, which was Constance Crozier's point.