Sébastien Bubeck, a machine learning researcher at Microsoft, woke up one night last September thinking about artificial intelligence—and unicorns.
Bubeck had recently gotten early access to GPT-4, a powerful text generation algorithm from OpenAI and an upgrade to the machine learning model at the heart of the wildly popular chatbot ChatGPT. Bubeck was part of a team working to integrate the new AI system into Microsoft’s Bing search engine. But he and his colleagues kept marveling at how different GPT-4 seemed from anything they’d seen before.
GPT-4, like its predecessors, had been fed massive amounts of text and code and trained to use the statistical patterns in that corpus to predict the words that should be generated in reply to a piece of text input. But to Bubeck, the system’s output seemed to do so much more than just make statistically plausible guesses.
That night, Bubeck got up, went to his computer, and asked GPT-4 to draw a unicorn using TikZ, a relatively obscure programming language for generating scientific diagrams. Bubeck was using a version of GPT-4 that only worked with text, not images. But the code the model presented him with, when fed into a TikZ rendering software, produced a crude yet distinctly unicorny image cobbled together from ovals, rectangles, and a triangle. To Bubeck, such a feat surely required some abstract grasp of the elements of such a creature. “Something new is happening here,” he says. “Maybe for the first time we have something that we could call intelligence.”
How intelligent AI is becoming—and how much to trust the increasingly common feeling that a piece of software is intelligent—has become a pressing, almost panic-inducing, question.
After OpenAI released ChatGPT, then powered by GPT-3, last November, it stunned the world with its ability to write poetry and prose on a vast array of subjects, solve coding problems, and synthesize knowledge from the web. But awe has been coupled with shock and concern about the potential for academic fraud, misinformation, and mass unemployment—and fears that companies like Microsoft are rushing to develop technology that could prove dangerous.
Understanding the potential or risks of AI’s new abilities means having a clear grasp of what those abilities are—and are not. But while there’s broad agreement that ChatGPT and similar systems give computers significant new skills, researchers are only just beginning to study these behaviors and determine what’s going on behind the prompt.
While OpenAI has promoted GPT-4 by touting its performance on bar and med school exams, scientists who study aspects of human intelligence say its remarkable capabilities differ from our own in crucial ways. The models’ tendency to make things up is well known, but the divergence goes deeper. And with millions of people using the technology every day and companies betting their future on it, this is a mystery of huge importance.
Sparks of Disagreement
Bubeck and other AI researchers at Microsoft were inspired to wade into the debate by their experiences with GPT-4. A few weeks after the system was plugged into Bing and its new chat feature was launched, the company released a paper claiming that in early experiments, GPT-4 showed “sparks of artificial general intelligence.”
The authors presented a scattering of examples in which the system performed tasks that appear to reflect more general intelligence, significantly beyond previous systems such as GPT-3. The examples show that unlike most previous AI programs, GPT-4 is not limited to a specific task but can turn its hand to all sorts of problems—a necessary quality of general intelligence.