Riskgaming

First impressions of OpenAI’s new GPT-4 AI model

Description

ChatGPT has overtaken the cultural zeitgeist faster than any consumer service in the history of technology, with some analysts estimating that it has already been used by more than 100 million people. So when OpenAI, ChatGPT’s creator, live-streamed the launch of its new AI model GPT-4, there was a rush of excitement reminiscent of the Apple product launches of the past.

It’s been about 24 hours since GPT-4’s public launch, and all of us here at Lux have already extensively played around with it, so it seemed apt for a rapid response “Securities” episode on our very first impressions. Joining host Danny Crichton is Lux Capital partner Grace Isford, who not only has been playing around with ChatGPT, but also Anthropic’s new bot Claude, which was a bit overshadowed between SVB’s situation and OpenAI’s announcement.

We talk about GPT-4 and what’s new, its new frontiers of performance, the increasingly impenetrable black box OpenAI is establishing around its company and processes, the company’s competitive dynamics with big tech, and much more.

Transcript

This is a human-generated transcript, however, it has not been verified for accuracy.

Danny Crichton:
It's like not only is the black box a black box, but then the people building it don't want to tell you how the black box was made in the first place.

Grace Isford:
Yeah.

Danny Crichton:
Hello, and welcome to Securities. I'm Danny Crichton and they say, or at least William Gibson says that, "The future is already here and it's just not equally distributed." And yesterday we saw the future getting distributed because OpenAI finally released its GPT-4. It sounds like they've been beta testing for some folks. Some folks even got to see early demos, but now it is open for wide distribution on Chat Plus, I guess, similar to every other streaming service on the internet. We've played with it here at Lux Capital. I thought Grace Isford here is going to join us to talk a little bit more about what we've already experienced in the last 24 hours since its launch. Grace, welcome to the program.

Grace Isford:
Thanks Danny. Great to be here.

Danny Crichton:
So you logged in, you've been using Chat Plus. I think we used our corporate credit cards to pay the ridiculous $20 a month fee, apparently better than HBO these days, on Chat text. But what has been your experience so far with GPT-4?

Grace Isford:
This was a highly anticipated launch. OpenAI's been hyping it up for a while. ChatGPT was the fastest growing consumer app in history. So I do think OpenAI had a lot of pressure around that launch. Overall, I do think the model is meaningfully better in performance quality, the big launch around multimodal. You can now take a picture and then directly import that into the model, and then you could say okay, now take a picture of this and make it a website. Really almost take away a lot of the complexity of translating a text or an image to a physical product on your computer. So I think that to me was one of the bigger things. The other example they used online was that you take a picture of your countertop, you have eggs, you have flour, you have oil, what can you make?

I do think the integration with higher reasoning is pretty meaningful as well. The tax code example they used was pretty interesting in the live demo where you could actually give a very complex tax problem and then they can research based on the internet and other documents into what the desired outcome should be for a pretty complex example. The mathematical and quantitative and qualitative reasoning is a much higher thing that went viral on Twitter. It scored like 90th percentile on the bar. Aced all the APs and SATs.

One example that I thought was really cool, Josh and I were actually just talking about this morning. It can be quite inventive in drawing and reasoning various connections. We were talking about one where you put in a prompt, "Oh, give me a very dramatic music piece." And so we'll give Beethoven's this, something this. I can even read up here what it said, "Symphony No. 5 in C minor," right? And then you could type in and say, "Okay, what about that was unique? Or what about that music?" And he talked about the precise frequency. And then you can say, "Okay, but what about that frequency and how would someone who has finished Fuchsia interpret that?" Right? And so see how I just went from a lay person's understanding of music to a very complex musical symphony, to a completely different domain that heightened level of both precision of reasoning and interdisciplinary aspect. Just one example of some of the things we're starting to see.

Danny Crichton:
I think it's been incredible. Obviously, the demo day was yesterday or I guess the live stream. I don't know what your opinion was, but it feels to me like this is hitting that kind of Apple, big debut Steve Jobsian feel where it's highly anticipated. People are signing in, they're downloading the app to watch, and a lot of really cool stuff.

So to backtrack, GPT-3 and then ChatGPT, which is the layer on top of that was released end of November, early December. It's been about four or five months in launch as you pointed out, fastest growing consumer app. One firm estimating they had a hundred million users in about 90 to 120 days. So nothing has ever grown that fast. But what's interesting is GPT-3 was sort of the single model. It had text tokens, translated what you were writing into outputs. GPT-4 though, has changed that and it's going towards a multi-model architecture. Why does that matter?

Grace Isford:
Because it means that AI is increasingly becoming omnipresent in our lives. It's not just reacting to one of our senses. It's going to be increasingly reacting in a part of all of them. I think it's very intentional, a step towards AGI, which is what Sam Altman has been preaching about. Sam is a master storyteller, and part of his very intention with the launch yesterday and how it was presented is to create a bit of an allure and a brand that people want to aspire to work with and be a part of. And trying to create this aspirational Apple-like brand is where they've at least been going to date.

Danny Crichton:
You look at the iPhone. Year after year, you go from 12 to 13, 14, it's sort of evolutionary improvements. The screen's getting better, the notch got smaller, the camera's getting a little bit better. If you skip one, you probably don't miss out. I feel like we're in this extraordinary speed because I played a lot with ChatGPT, with GPT-3. It was good, but I quickly reached the limits. I try to throw things that had never been written before and I always use this example of using Michel Foucaultian analysis on the VC industry and it's like, vaguely it kind of knew what I was getting at but it didn't really have any original thoughts. Now, I mean just six months later with a newly trained model, that same prompt is so much more detailed, more precise. It actually seems to be reasoning, although it's not technically reasoning, but it feels like it's reasoning.

And then Josh and I were at lunch today and we were talking about the stuff around the prompts, but when you start to integrate this with technologies like ElevenLabs and doing this with voice, we're starting to see this with Runway and video. The speed at which both every single model is getting better, and they can start to be combined combinatorially, like they're actually able to take that voice or as you said, and this was in the live stream, a picture of a website where you have a graphic at the top, A button here is able to translate that straight to code. That's happened in just a couple of months, and I don't think we've ever seen that speed of technology progress in any field in the last couple of years.

Grace Isford:
And one more thing I'd add to that is just the feedback loop is compounding, the number of data points that OpenAI is getting every time you and I use ChatGPT to feed into that model. Or Runway is getting every time someone is leveraging technology is compounding those model strengths as they fine-tune. So that's important. I would say and push back, we're still not quite there on full reasoning.

And if you actually look at the AP scores, it tested poorly on, English literature and composition and reasoning were the ones, I think it was like a two. So I even was trying to draft something this morning on a blog post based on past blog posts I had written on the API economy and how it's reflected in the current Python SDK world, and Anthropic API, and open IPI, and how that's all reflecting. And it was okay. I don't think it was anything that profound or analytical or unique about it, but it was very good at spitting out a task and doing what I said. And so that second order of depth of human complexity we're moving towards for sure, and it's getting better. But I still don't think we're there. As GPT-4 continues to improve, is that second or third order thinking, particularly based on a given prompt or text or image input.

Danny Crichton:
Now we were talking about a little bit earlier, obviously this got released yesterday. Yesterday being Tuesday, March 14th. What was important about the timing there? Why did you think that that was a unique time given everything else going on in the AI industry?

Grace Isford:
I think there was a few reasons why it was a very important day. One, Google for those who didn't see it, launched a whole suite of developer API, Generative AI for Google Docs and lot of their drive tooling, which is a pretty meaningful advancement. So I think partially that. I think partially OpenAI just always wants to be first, so I think they're pushing to be ahead, not just ahead of Google, but of any other AI player on the market trying to get more people leveraging the technology as soon as they can.

Danny Crichton:
And then of course, I think there was SVB, which obviously with the SVB situation over the weekend, either delayed or it felt like, I always feel like these get released on a Friday so everyone gets to play with them over the weekend. And I think delayed everything, distracted, and probably everything got merged on the classic Tuesday for all product launches.
But what I find interesting is you were also playing around or maybe you haven't played around yet with Anthropic's Claude.

Grace Isford:
I have been. Yep.

Danny Crichton:
Is it Claude?

Grace Isford:
Claude.

Danny Crichton:
Claude. And I do find it interesting that Claude, which is such a human, or Sydney, the Bing version, we have this anthropomorphizing of AI. And then the most popular one is the one that sounds the most engineered name, ChatGPT. It's an acronym, it's a letter and everything, but how did your experience with Claude compare with what you've experienced with ChatGPT?

Grace Isford:
Yeah, so I've interfaced with Claude as a Slack bot, so it's very similar Q&A functionality. I found it pretty comparable and similar to ChatGPT-3. I think GPT-4's reasoning is a bit better, although I know Anthropic is imminently launching a more sophisticated chatbot and also imminently launching more sophisticated models. But I would view Anthropic as the really obvious number two right now in this space. I don't think there is a clear other company that has a large model put together, trained on a massive data set.

Danny Crichton:
And one of the interesting things here, obviously you mentioned OpenAI wants to be first. They're getting aggressive, which is very different from where the company started. What I find interesting is 2015 start, I was just looking it up, so it's been seven years, almost eight that the company has been around, started as an open research lab, that's why it's called OpenAI. Bring a lot of researchers to experiment, tried a bunch of different products. In some ways modeled after DeepMind in this idea of, you bring the smartest PhDs and smartest researchers around the table, you discover what's interesting and what goes on.
But the company's culture seems to be changing and changing really fast. So it used to be open, the models, the parameters, how it was built, how it was trained, all that was open for inspection. It was traded with the hope that the AI community would learn from each other and everyone can build on top of it, but that changed a lot, it looks like, with GPT-4.

Grace Isford:
I actually just looked up on LinkedIn how many employees they have. According to LinkedIn, there's 634 employees at OpenAI today. I actually think it's a higher number and I've been hearing that a lot of folks are no longer putting it on their LinkedIn. I also think somebody notable, there's no authors of the GPT-4 paper, the technical paper they launched in tandem. Which is very untraditional for any academic paper you've seen today, any archived paper you're going to go look up, you're going to have five, six, seven, eight authors. So it's almost very much this cloaked entity. And so I think OpenAI has truly had a full evolution.

Danny Crichton:
Yeah, I mean as you pointed out, no authors. Which by the way, even as they have more employees, in the astrophysics world it's not uncommon when people work on satellites and you get something like the Hubble telescope, but there might be 10,000 or 15,000 authors on a paper and it's sort of ridiculous. But you're like everyone at the telescope built this out and got the data and they all contributed to the project. So on one hand, avoiding poaching. Second, the company doesn't release any information about how its model is coming to be. There's no training data, there's no sense of how they built the pipeline to actually train it. There's no discussion of even how many parameters were built into it. We actually know nothing about the training model whatsoever anymore, which is different from ChatGPT-3 and earlier models.

From my perspective, this is where I feel like the company is making a transition from academic research lab where you felt like it was a university spinoff, to its commercialization now. It is a race to beat others. There's fear of maybe by Anthropic, all the big tech companies coming after you. And there's just this open sharing that's built in the name of OpenAI, it's really become closed AI. It's become a black box where even how the black box is built, we don't even know anymore.

Grace Isford:
I totally agree. And that's probably the biggest critique of GPT-4 today and OpenAI in general. It is the transparency at tension with this vision for AI for everyone, and AGI for everyone. When there's not that much transparency into how that model is even being constructed, all the way down from the compute to the data sets, up to the actual model parameters and deployment.

Danny Crichton:
I just think it's funny because Elon Musk was one of the co-founders of OpenAI, and obviously was focused on Open. He's actually been criticizing the company over the last couple of months and I rarely agree with Elon Musk, but on this one I do. It is becoming closed. It's actually quite Orwellian that the closed AI company is literally called OpenAI. But again, I think they feel a lot of pressure from big tech. I don't know, what's your impression? I mean is big tech coming after them? I mean obviously Facebook has launched its own model, which I guess also leaked, leading to its own chaos and controversy.

Grace Isford:
Well, I think the unique thing about the Facebook publication of the LLaMA model for those who didn't see it was, it was very transparent relative to OpenAI. They were very clear on the data sets they were using, relatively clear on the modalities they were using. And so OpenAI is completely closed. Facebook was like 75% or 80% open, I would say. And so that was a stark difference.

I think every cloud provider's taken a different tact. Microsoft with OpenAI or closed AI, which may inherently benefit them. And so it's not in their interest to open. AWS is more of the helping folks equip them with compute, et cetera, but not actually providing the major foundation model contracts. But it really gets to the bigger tension between open versus closed in AI today. I think we are early investors in the company called Hugging Face. We led the Series A there. Have always intentionally been very open. They're quite like the hub for everyone to come and play and contribute and grow, the contrary of these more closed models where folks don't know what's going on and can't just pick up or plug it into their IDE in a way you might be able to on Hugging Face.

I don't think it's clear which way will win. It may be more oligopolic where we're both, are parts of the world going forward. But the transparency and lack of transparency piece I think will be increasingly important to grapple with as we get AI in production. So if you're a major bank, a major 400 to 500, you got to know a little bit more about what's going on under the hood. And so I think we're going to see that reckoning probably within the next 6, 12, 18 months as folks think more about these large language models in production.

Danny Crichton:
I couldn't agree more. What's interesting is if you think about the coding environment for just software the last couple of decades, it started closed. Everyone had their own proprietary software and then it became more and more open given the open source movement, and folks trying to make libraries accessible. It's interesting that AI is going open to closed. And I actually think that there's going to be quite an allergy to this. People are used to the idea that they can expect, can figure out what's going on. It's already a black box enough to make it even another layer of black boxes, a coffin inside of a coffin, inside of a coffin, to me seems the wrong direction and probably not the right strategy to go. But better for commercialization and particularly when you're competing directly with other folks who are also well funded. Sort of understandable why they're doing it.

But that's our early take on ChatGPT, with Plus with GPT-4. God, there's so many languages. Claude is so much easier. Claude seems like a friendly person.

Thanks so much for joining us. We'll talk again soon.

Grace Isford:
Thanks Danny.

continue
listening