OpenAI AI Strategy

2020-08-21

OpenAI strategy seems to be to scale their way to AGI through simple and stupendously large neural networks. The movement seems to arise from Ilya Sutskever, the chief scientist officer of Open AI. Sutskever has repeatedly mentioned the power of large deep neural networks.

At some point in about 2010 or 2011, I connected two facts in my mind. Basically the realization was this. At some point we realized that we can train very large, I shouldn't say very you know they're tiny by today's standards, but large and deep neural networks end-to-end with backpropagation. [...] He trained a 10-layer neural network end-to-end without pre-training from scratch. When that happened, I thought, "This is it." Because if you can train a big neural network, a big neural network can represent very complicated function. [...] At some point Alex Krizhevsky wrote these insanely fast CUDA kernels for training convolutional neural nets and that was BAM, let's do this, let's get ImageNet and it's going to be the greatest thing.¹

Similar sentiment can be found in an article from 2015.

To elaborate further: it is well known that any algorithm can be implemented by an appropriate very deep circuit (with a layer for each timestep of the algorithm's execution – one example). What's more, the deeper the circuit, the more expensive are the algorithms that can be implemented by the circuit (in terms of runtime). And given that neural networks are circuits as well, deeper neural networks can implement algorithms with more steps -- which is why depth = more power.²

OpenAI Five and GPT-series have demonstrated OpenAI's conviction to the scaling hypothesis. Next big step probably is a multi-modal model that can ingest inputs from different modalities. They've already experimented with applying transformer architectures to not only natural language but also on audio and vision as well.

OA, lacking anything like DM's long-term funding from Google or its enormous headcount, is making a startup-like bet that they know an important truth which is a secret: "the scaling hypothesis is true" and so simple DRL algorithms like PPO on top of large simple architectures like RNNs or Transformers can emerge and meta-learn their way to powerful capabilities, enabling further funding for still more compute & scaling, in a virtuous cycle. And if OA is wrong to trust in the God of Straight Lines On Graphs, well, they never could compete with DM directly using DM's favored approach, and were always going to be an also-ran footnote. While all of this hypothetically can be replicated relatively easily (never underestimate the amount of tweaking and special sauce it takes) by competitors if they wished (the necessary amounts of compute budgets are still trivial in terms of Big Science or other investments like AlphaGo or AlphaStar or Waymo, after all), said competitors lack the very most important thing, which no amount of money or GPUs can ever cure: the courage of their convictions. They are too hidebound and deeply philosophically wrong to ever admit fault and try to overtake OA until it's too late. This might seem absurd, but look at the repeated criticism of OA every time they release a new example of the scaling hypothesis, from GPT-1 to Dactyl to OA5 to GPT-2 to iGPT to GPT-3… (When faced with the choice between having to admit all their fancy hard work is a dead-end, swallow the bitter lesson, and start budgeting tens of millions of compute, or instead writing a tweet explaining how, "actually, GPT-3 shows that scaling is a dead end and it's just imitation intelligence" - most people will get busy on the tweet!)³