431. Web 3.0 Database Dominance, How to Trust Black-Box ML Models, Google’s Ad Business in an LLM-First Search World, and Lessons from Looker, Monte Carlo, and MotherDuck (Tomasz Tunguz)

431. Web 3.0 Database Dominance, How to Trust Black-Box ML Models, Google's Ad Business in an LLM-First Search World, and Lessons from Looker, Monte Carlo, and MotherDuck (Tomasz Tunguz)


Tomasz Tunguz of Theory Ventures joins Nick to discuss Web 3.0 Database Dominance, How to Trust Black-Box ML Models, Google’s Ad Business in an LLM-First Search World, and Lessons from Looker, Monte Carlo, and MotherDuck. In this episode we cover:

  • Blockchain and Web3 Databases, with a Focus on Ethereum and Its Potential Dominance
  • AI, Data Analytics, and Their Applications in Business
  • Machine Learning Challenges and Opportunities,
  • AI Innovations in Robotics, Self-Driving Cars, and Programming
  • Using AI to Personalize Sales Pitches and Improve Response Rates
  • AI’s Impact on GDP Growth, Productivity, and Profitability

Guest Links:

The hosts of The Full Ratchet are Nick Moran and Nate Pierotti of New Stack Ventures, a venture capital firm committed to investing in founders outside of the Bay Area.

Want to keep up to date with The Full Ratchet? Follow us on social.

You can learn more about New Stack Ventures by visiting our LinkedIn and Twitter.

Are you a founder looking for your next investor? Visit our free tool VC-Rank and we’ll send a list of potential investors right to your inbox!

Transcribed with AI:

0:18
Tomasz Tunguz joins us today from Palo Alto. He is Founder and General Partner at Theory Ventures, an early stage venture fund investing in software companies that leverage technology discontinuities into go-to-market advantage. Tom has invested in companies including Looker, Monte Carlo, MotherDuck, Mysten, and Arbitrum. Of course, he is also author of the ever popular blog tomtunguz.com. Previous to Theory, Tom worked as a managing director at Redpoint Ventures. Tomasz, welcome to the show!
0:51
Thanks for having me, Nick, it’s a privilege to be here. So bring us up to speed it’s been,
0:54
I don’t know, five plus years since you’re on the show. Obviously, you’ve launched your own firm at this point, walk us through the path from red point to theory.
1:03
Yeah, we have built a little team, we have seven people now have theory. And the idea is to build a concentrated portfolio where we do a lot of research into the areas where we invest. It’s it’s a real thing. I have a lot of empathy for founders understanding how I mean, and you know, the same thing, right getting off the ground, it’s a marvellous thing to be able to take an idea and actually see it come to fruition. And it’s incredible position of privilege to have people bet that it will work, whether it’s the incredible people who work at theory, or the limited partners who’ve invested along the way. And as we started, basically about a year ago, as I said, we’re about seven people now we’ve invested in seven companies should be eight on Monday. And we invest in everything to do with data. So there’s the modern data stack, which we love, and then applications and infrastructure of AI. And then also we look at block blockchains as databases. We think Ethereum is one of the most valuable database companies of all time, if not the most valuable database company of all time, and has the price performance characteristics of those databases improve, they will be much more ubiquitous than they are today.
2:10
Very interesting. And what stage is the best entry point for you?
2:13
In theory, we typically become involved either at the seed or the A to the investments we’ve made so far, our formation stage, and then about four series A’s one series B.
2:25
Okay, and how many deals will you do for the first one?
2:28
Around 12? To 15? So not very many.
2:32
Okay, so average check size ballpark,
2:35
this bimodal. See right now is around three and a half to four. And then the A’s are around 13 13.5.
2:41
Okay, very good. Awesome. So I think we covered the thesis. Let’s talk a bit about web three. You mentioned the theorem. I think there’s a different viewpoints on Aetherium. Can web three databases be valuable as standalone businesses? So
2:55
I would say yes, I look at Aetherium is worth 350 billion in market cap, it’s roughly seven times snowflakes valuation, it’s worth 60% More than Salesforce. And a lot of people will say that’s two lips. And there’s there’s some truth to that right there. Many of these tokens that have very small float traded massive multiples, have no appreciable revenue realities. Aetherium as a business, it’s really a project, but let’s call it business for now produced almost 4 million in net income in q1, which made it the most profitable software company, we can call it that in q1 on a percentage basis, about 40% 45% net income margin. And if you were to take just the aggregate dollars, 400 million, it would be it’s, if you consider it a software company, it’s the software company in q1, it’s a six number six in terms of total net income production, total profit dollars, after Microsoft, Adobe, and zoom and a handful of others. So there’s a real I mean, it’s hard to argue there’s a real business there, I won’t tell you that the net income margin has been as beautiful as that for ever for that business. But reality is most of the software companies that are publicly listed on the NASDAQ still aren’t profitable. And so here you have a company that’s worth trading at a very high multiple producing a lot of cash. How
4:18
much of that net income do you think is derived from real utility and real applications? Yeah,
4:24
I think stable coin volume is now larger than visa in the first two months of this year, there was 2 trillion of stable coin issuance. And that’s more than the visa network processes. Now would argue some fraction of that is real. Right, then there’s a lot of next generation finance applications. And it’s easy to derive those financial applications is not being meaningful or doesn’t have sort of the cloud or the class of the New York Stock Exchange or the NASDAQ. But the reality is, there are businesses that are built, trading those assets there, our estimate is that there’ll be somewhere between 150 games over the next two years that are built on top of blockchains that are producing real income. And so it’s early, right? But there are real I mean, we know of one fortune five company that has built its accounts payable and accounts receivable system on a blockchain. Because the database is immutable, it’s effectively double entry ledger. And so it’s Reddit. During the s one, they declare that all of their cash and short term equivalents are held in stable coins. They’re not held in a bank account. So there’s, there are these little points of, okay, there’s an innovation here. And that’s a real company. And there’s an innovation here, and that’s a company that transaction volumes are going up, and it’s really big. And so yes, okay, let’s discount it by like 5060 70%, just because it’s, it’s wild, but it’s real.
5:54
Which is the fortune five company?
5:57
Oh, I don’t want to say it’s not public. No, it’s not. I don’t know if it’s public. And so I don’t want to. Alright, so there
6:04
are various debates about which of the web three databases is going to win. And there are some issues with the theory, I’m naturally, but it is the largest web three database by market cap, do you think it will be the dominant database going forward? Or is there a challenge to theories model that will allow others to overtake it?
6:24
I think so. In Aetherium ecosystem, you have Aetherium as a base layer, and then you have the next generation L one, l TOS which are now processing more in transaction volume, their transaction processing more in transaction count, and in total value most days. And so arbitration has about half of that market share, and there are others, I think those layers on neath will become the dominant ones. And like at the end of the day, when everything needs to be settled and written to a permanent ledger, and you can pay for a higher transaction costs, let’s say, it will be written to eat most of the transactions will happen to the L two, or even the L three layers, you can think about it as like a database caching layer. And then an L three Aetherium world is an application specific database with different parameters. Then you have Solana, which its ups and downs for Solana doing FTX, it was ripping, and then it really kind of fell off as that as FTX imploded recently, it’s had a resurgence. And there’s some technical challenges associated with it. It is a more centralised model, the rust programming language is a big appeal. And then you have the whole cosmos ecosystem, which more predominant in Asia, but has some really compelling attributes, particularly around compatibility. And I think what we’ll see is, they’ll probably three or four majors here, just the way that we have three or four major clouds. And they’ll have different parameters that one will be the most developer centric, one will be the one where it has the highest performance for very demanding workloads. Maybe that’s Ilana, maybe it’s another technology. But I think you’ll see three or four that ultimately emerge as the big clouds, let’s call them what’s
8:04
the future for web three databases in in software development.
8:08
So I think there are a couple of different ones. But our long term view is that because of the price performance of these databases, let’s just set that level set for a second, three years ago was a million times more expensive to write to web three database than rds. And today, it’s about today, it’s about catching the virus. Today, it’s about 1000 times more expensive. So in three years, we’ve cut three orders of magnitude off that cost. And at some point, it will ask them to write and when three databases because of their decentralised nature will always be more expensive. But the reality is many software companies with 70 to 80% margins won’t care. And for really sensitive data where it has to be secure, you will likely write it to a web three database, because you can cryptographically prove that the data wasn’t I mean, how many times is your social security number been hacked? Nick? How many never? Never,
9:01
to my knowledge,
9:01
oh my gosh. So my social security number has been hacked eight or nine times. And it’s a hospital system that leaked the credit card company, the auto finance business, I mean, I think I’m on I get these notices every three or four months that says here we’ve enrolled you in a free credit monitoring programme. And for 10 years, I’ve never needed to pay because somebody leaked a bunch of stuff once and it just keeps recirculating. And so funny story, my kids went I
9:29
don’t know if you’re a bigger target or me after this discussion. Yeah.
9:33
So my kids went to the dentist. And the dentist asked for my social security number. And I said, No, you can’t have that. And the reason they do it is because they look up your policy, but the reality is, who knows what the security policies are of a local dentist office and if big hospital chain big local hospital chain cannot secure What hope does a dentist have? And so I think we’re going to be moving to this ecosystem or sort of an art architecture design where major corporations want control over their own data. We’re seeing that with iceberg and snowflake for different reasons. And users will want to control their own data. And so the sensitive information will very likely move to web three databases, and we’re talking here like a five or 10 year period. This is not happening tomorrow. But that’s our belief. And so the price performance characteristics have to get there, there’ll be increasing regulatory pressure and costs associated with storing data. GDPR CCPA, every state will have their own I think there are 40 states in 2025, that will have some form of regulation, German data, locality rules, and so all those things will really drive a lot more demand for web three data. But the reality is like, the API’s for these databases will look like a Mongo API, or they’ll look like an RDS API, nobody will know they’ll just be marketed as cryptographically, secure, can’t leak, HIPAA,
10:50
data, consumers get more control over their data and access to the data. Do you think that creates headwinds? For Tech?
10:57
No, I think I think there’s this broad sentiment about not being in control of your own data. And there’s been up when I was at Google, there was anti Google sentiments, because Google just know so much about you, I think, as the end, so I think there’s that underneath the water. A lot of consumers don’t prioritise it enough to care. Some companies are finally starting to care about it large language models and the use of the information that you have will totally accelerate it. I mean, I think the next generation architecture for lots of large language models will be running on device. And so I mean, already seeing it with Google, like the voice transcription, the model is running locally, some of the generative search, I think, is running locally. And so I think people will want this to happen. And I think at some point, the companies that are processing huge amounts of data will also want to move to this kind of architecture, because it compliance costs will be enormous. What
11:52
do you think happens to Google’s advertising business on an LLM? First Search world?
12:00
Yeah, great question. One of our predictions, one of my predictions for this year is that half of search queries by the end of the year are generative. And the ad model at the end of this year, yeah, I know, that’s really aggressive. It probably won’t be right.
12:13
I don’t know. Like, every time I go to my parents are 70. Every time I go over there, my my dad is telling me about Chet GPT. And so if he’s using it, that gives some confidence to your prediction.
12:25
chatty BT is top 10 app on the App Store on my phone, when most of the queries that I see they have a Google generates a paragraph on the top. And so I think, initially, I thought that the searches would be fully generative. But I do think we’ll probably end up seeing a hybrid where at the top, there’s a summary and the q&a box, which is a Google has now and then a bunch of links at the bottom. And I think in this model, two things happen. The first is Google continues to win, and continues to have incredible market share. And the search ads business is worse. In other words, it’s less profitable. Why is that right now, I don’t know 60 to 70% of a search results, pages ads. And just because there’s now a generative text box at the top, that is taking up ad space, the revenue has to decrease. And so I think that that’s what Google continues to win. But to search as businesses actually is worse, because it’s at least 10 times more expensive to run a generative search query than a standard query, that number will go down with time. And the athlete at 80 to 90% of the revenue. It’s I think it’s less than that. And YouTube is based pretty enormous. The content ads business, which is the ad network is a pretty enormous business. And then like the Display Network, but I it’s been a long time since I’ve looked at the relative share, but its vast majority of profits. But because there’s no rev share. We’re on the content network, you have to pay the publishers for running your ads. So I think Google wins, but it’s less profitable.
13:52
Back on sort of data, what areas or subcategories within the modern data stack present the most whitespace and opportunities for startups in the coming decade.
14:02
Yeah, we’re excited about the BI layer. So I was involved with Looker, we back to a company called Omni, which is a handful of the X Looker core team that are building a new model, the new way of building data combines that allows like a business analyst to say I want to define cost of customer acquisition. They create that metric in their approval workflows to put it into centralised it. So
14:22
it’s amazing that talent from Looker is spinning out and doing great things like I remember having you on the show talking about Looker, very early on.
14:30
It’s actually a wonderful collection of people and they understand the ecosystem really well. So thrilled to be involved with that team. I think there’s the other The second area is the databases themselves, right? So I’m involved with Mother Duck and Jordan, who is the tech lead at BigQuery came to this realisation that 80% of BI workloads are smaller than 10 gigs in size which means on your on my MacBook Pro, we can analyse that data and so he has built a company called Mother Duck that’s commercial Using that data. And what’s really interesting about that technology is not only is it really fast, but you can embed it in on the actually uses it. So if you have a really big data set and you want to analyse and visualise it, you only will take a gigabyte of it, put it in the cache in your browser, and then actually run wires on container with duck DB there, and then analyse the data really fast. And so it’s a new architecture is like the new edge. Yeah, it’s a new edge. And so you can take advantage of the fact that like the laptops that we have are super powerful increases your margin and the latencies or signal. I mean, most of the duck DB queries are less than 10 milliseconds, even on pretty large datasets. And the user experience is meaningfully different even from any other browser based application. And then the last category is data transformation. So what we hear from buyers today is lots of different data pipelines, large language models are stressing those data pipelines, because now all of a sudden, everything downstream of the Cloud Data Warehouse used to be offline, right bi exploratory analytics, offline customer segmentation, churn prediction. But now those data pipelines are actually being fed into the training models. And, and so there’s a lot of stress there. And we have made an investment in that category, which is we haven’t announced yet. But think there’s a lot of innovation around data movement and ETL. pipelines.
16:21
Yeah. What is the primary constraint on the future in an AI first world? Is it going to be bandwidth? Is it going to be processing?
16:29
I think the scale of data is one. Yes. And then the cost associated with the compute is the other. So and this is why snowflake, I mean, you have separation of storage and compute in the sense that a lot of the major enterprises that are snowflake customers want to store their data in Iceberg tables inside of s3 manages their own data, and then allow different query engines like snowflake or duck dB, or spark to hit them for different applications. And that’s driven first by cost. But also, there are many more users demanding access to that data. And so that’s a really big and then the last is these things can be really expensive to run. So that end cost going to the leaders that we speak to setting aside AI initiatives cost is the single biggest driver, snowflake used to have 177%. net dollar attention down to 127. Last time I looked in, so it’s real that people are starting to pay attention to how do i segment workloads as a functional cost? You
17:29
mentioned to me that you think and seek out applications of machine learning where the feedback loops are really short. Explain why you’re interested in short feedback loops.
17:38
So I’ll give you an example. So if you use Google, and you click on the first link, and then you come back to the search results page within five seconds, and then you click on the second link, and Google never sees you again, that’s a really fast feedback loop for Google to understand that the first link shouldn’t be the first link for that search, especially at scale. Imagine if it took Google a year to figure that out, the search results will be much worse. But because the feedback loop is basically instant. It’s really good. And so what we’re looking for with AI are very similar feedback loops. If you use Bart or Gemini. And you ask for a result, there’s a little G button that says verify the results. So we were looking for an SUV with three rows. And I was really curious about what is the third row legroom across the most common SUVs in America. And Gemini produced for me a table and then I hit the Verify button. And two of them two of the models, it was correct. And to the models, it was not. But that feedback loop goes and presumably retrain the model. And so we’re looking for is products that that are able to capture that feedback loop because and these are broad numbers, but the way we think about it is it’s pretty easy to get pretty easy to get a machine learning system to 80% accuracy. Think about the self driving car, it’s really hard to get to that marginal 15% to 90% accuracy. The faster a product or service can achieve that level of accuracy, the more they’ll differentiate basically exponential in terms of difficulty. And so faster and faster feedback loops allow you to climb that curve ahead of your competitors, like
19:12
humans and computers. I would contend. Tom, are you more interested in machine learning at the infrastructure or the application layer?
19:19
I think we’re more interested right now at the application layer. There are just many more opportunities. So the way we think about it is in web two, there are three companies that are the largest cloud, they’re worth about 2 trillion collectively. In order to get to that level of market cap at the application layer you need 100 We’re talking about snowflake and Netflix and many others. And so we think there’s something very similar that will happen here with the capital intensity, the large language models, Amazon was recently quoted saying that the next generation of models will cost a billion dollars per training run to execute. There are not many companies that can afford to pay a billion dollars. And I feel for the engineer who has to hit the go button because if they’ve made a mistake on Something that expensive. But so I think that the capital intensity will really benefit just a handful of players for the infrastructure layer. Now, that being said, I think there’s a bunch of developer tools that will be really important to simplify some of these architectures, whether it’s query routing, or vector, improving vector computation, or analytics and evaluation, those categories, I think, will lend themselves to startups more than they will the large language models. But broadly speaking, in terms of the total number of count or the addressable opportunity for startups, the application layers is much bigger.
20:36
One of the challenges with machine learning is that as it gets more sophisticated technical leaders often don’t know how or why a model makes decisions. So the future can be unpredictable and present scenarios that have no precedent from past data. How can CTOs with high stakes tech infrastructure, get comfortable implementing a system where they don’t understand and can’t explain how it works? So
21:00
this is, aside from security. This is one of the biggest questions. So these models are they have two attributes to them. I mean, one of their fancy word prediction machines, and I know I’m simplifying, but they’re fancy word prediction machines. And they have two characteristics. The first is non deterministic. And the second is that they’re chaotic. What does that mean? If I’m typing into a chatbot? And I ask, How do I reset my password question mark? And then I ask how do I reset my password space? Question mark, those are two different inputs, and will very likely receive two different outputs. The other challenge is compared to like a classic machine learning classifier. Is this a photo of a dog? Or is this a photo of a cat, there’s two different potential inputs. And there’s two different potential outputs, the universe of potential inputs into an LLM is enormous, right? I can ask, How do I reset my password in any number of different ways in any number of different languages, with all kinds of creative punctuation. And so consequently, the universe of potential outputs is also equally large. We’re kind of measuring orders of infinity. And so as a result, if I’m building a product, it’s really difficult, I can’t take an average result. And say, the average result is pretty good, because distribution is so broad. So there’s a couple of different ways to solve this, or maybe not solve, but manage. The first is to run what people are doing now, which is evaluations, you just have example queries. And just like a test suite that you might run on software, you just go through those queries. The second level of evaluation is to watch the user queries in an analytics platform, and then take the output of that and make them the test cases, because generating the test cases themselves, there’s an awful amount of there’s a lot of work. The third thing that people are starting to do is they’re starting to narrow the models and use smaller models that are built for purpose. So a large language model is really good at handling any kind of incoming query, because it knows so much about so many different knows so much about so many different things. If you have a very frequent query, like how do I reset my password, or things like that, a company may ultimately end up training a very small model that is excellent at handling only that task. And so as a result, the inputs narrow and then the outputs narrow. But even in those scenarios, even turning down the knobs of temperature, which is a way that you control how creative or the distribution of the outputs would be for a model, you still have the risk of hallucinations or just outright wrong answers. There’s I mean, I’m sure did you see the bereavement? Canadian airline now. So there was a man who unfortunately suffered a loss in his family and the Canadian national airline has a chatbot. And he asked a chatbot, what is the bereavement policy for the airline and the Chatbot provided him a refund policy, he took a screenshot saved it booked the flight. And I think it was a 50% reduction in overall ticket price, because he needed to fly with some urgency. He applied to the customer support team and said, I have my 50% back and they said sir, we don’t this is not a refund policy, as a complete fabrication, and the government the judicial branch rule that the airline needed to needed to respect the policy that was produced by the Chatbot. Wow. Yeah. Do a real world ramifications. And there’s another example. I didn’t follow one
24:20
was the Chatbot originated was it a Canadian based Chatbot? I
24:25
think so. Okay, yeah. Yeah, it makes sense. And there is another example where a car buyer negotiated with an online chatbot to buy a car for $1. And the Chatbot said, agreed. And so there’s a right because you can manipulate we can ask it to write a poem you can ask it to write a song you can ask it to negotiate pretend to be someone and it will respond to you because the universe of potential inputs is so large and anyway, so I think there there are real ramifications and and I will see many more examples of these where humans nefarious or not and manipulating them A model is just the way that you know, to negotiate a good deal or
25:03
so the examples you said here, the stakes feel low to me, they’re monetary, and they’re on the order of maybe with the vehicle, it’s a couple 10s of 1000s of dollars, something like self driving now, stakes are a little bit higher. Right. Tesla recently launched their FSD 12. At the core of s FSD. 12 is a shift from traditional programming to neural network based decision making or imitation learning where raw video from the cameras on the vehicle is processed. And the tech translates what it sees directly into driving actions mimicking human cognitive processes more closely than ever before. So rather than programming all the individual edge cases, engineers can train the car to function like a human solely by providing numerous examples of humans driving cars. So this kind of gets back to the point, this is more of a high stakes, hallucination and this context could have significant ramifications. So I’d like your general take on what Tesla’s done here with FSD 12. I’d also like, your opinion on what this means for programming, if we’re getting away from sort of these logic based if statements and going to like a sort of a brave new world of how programming gets done?
26:24
It’s a great question. So I think there’s within the world of robotics, and I would put self driving cars in the world of robotics. There’s a lot of innovation that’s happening, that has come from Stanford, and Google from brain where we are teaching robots how to do things using reinforcement learning. So we’re doing things with robots. And as opposed to teaching them rules or having them look at data and learn from that data. We’re just we’re doing it alongside and if you Google Mobile Aloha, you’ll see some of these videos of a two armed robot cooking a piece of shrimp or folding some clothes or putting items into a grocery bag. And this is holding a lot of promise, think the dynamic, and I’m not that deep within the world of self driving cars. But I used to study control systems. And I think what’s happening inside, I would guess that what’s happening inside the self driving car, and especially with these robots is there’s some machine learning system that is suggesting particular actions. And then there are control systems which are deterministic, that have certain parameters that will say like, okay, the generative component, or the next generation machine learning component will suggest an action that will go into a classic control system that will say, know that, if we were to move the steering wheel that fast, the G, the acceleration forces on the passengers would be too great. Or the probability that this is the right decision is too low to accept. So let’s slow the car down. There was a I don’t know if you saw in San Francisco, one of the on ramps, there was one way Moe taxi that was trying to merge and there were a bunch of cones. And as a result, it was confused. So it just stopped. And then there were six other Waymo taxis behind it. Anyway, that’s an example of a brilliant control system, right? The cognition systems or the sensory systems on the taxi didn’t know what to do. And so it failed in a beautiful way. Nobody was hurt, people were a little bit encumbered with some delays. But that’s the end. So you have really great sensory systems. And then you have very strong control systems like autopilot systems for aircraft. That’s what I studied. One of my final exams in grad school. Those are classical mechanisms. And autopilots been working in aircraft for 50 years very safely. So I think that’s I think that’s how we balance these systems. There’s generative systems that suggest and then there are control systems that control in terms of what it means for software programming. I think these are two separate paradigms, where it is possible to teach programming through reinforcement learning, but quite frankly, like the co pilot systems and Kodama, they’re just really good. I mean, you look at Devon, which is a fully automated software engineer. And there are four panes, right, there’s GitHub, there’s a terminal window, there’s a web browser, and I forget the fourth but it’s tasked with creating a web website, it creates a county that ws it downloads, right GitHub repositories, it creates the user accounts and then it writes the code pretty good. I mean, it’s still early days, but it’s doing a lot
29:29
are you using this in your own workflows at this point? What are some of your favourite recent uses of AI?
29:35
So I love it I run as I’ve just been about testing llama three locally, the 8 billion parameter model that works really well. I love dictating because type a lot. There are lots of messages even
29:47
doing years dictation, although with the help of humans, I think that’s
29:51
right. Yeah, initially it was dragged but the new models are so sophisticated, that you rarely have to correct an error. And so most of what I published and most of What I write, and the reason I use an Android is actually for the dictation, the one really fun. One really fun workflow that’s new for me is this idea of a living document. So what do I mean by that? So let’s ask a parent to interview you, Nick. And I was collecting a list of questions, and I wanted to know about your background, I would start a document, I’d say, what is next? What is next work history, and it would produce that for me, and I’d say, here are 10 of Nick’s recent blog posts, and here are five of Nick’s recent podcasts, remember them all, and then put together a list of questions. And they would produce from your list of questions. And then you would tweet something. And then I would say, Great, here’s the tweet, add a question about this, and the LLM is actually managing. Now it’s no longer in a Google document. It’s in a chat GPT session or Gemini session, where that document lives. And as there’s new information that’s coming in, the LLM is actually updating. It’s not perfect. But it’s a completely different workflow around information management, where I’m not editing the document, I’m not organising the outline, just telling the computer do this or change this question, put this section about his background earlier, and let’s finalise like a lightning round or whatever thing that I want to do. And that’s, I really enjoyed doing it that way. I have to remember less.
31:15
I mean, it would be amazing if it was doing that on the fly as we’re speaking like, Tom, what you just said contradicted what you said, like four years ago on the show, in real trouble, that would be pretty good. That would be pretty good. Yeah, it’s funny because we are incorporating that a bit in our our outbound flow. So we have like this platform that scrapes a bunch of stealth deals that people don’t know about. And then it goes out to the profiles, it finds the Twitter or finds the LinkedIn and find similarities. And so without even me having to go to email, I can just click on a startup in our platform, and it will pull up a note that says, Hey, I see that you went to a big 10 school, and you’re an athlete, I was a swimmer at Indiana, I see you’re building a startup, it would be great to connect, and so like it can find some similarities in my profile on like a founders profile to kind of customise the message and increase the hit rate.
32:09
It’s the future. And my understanding is that the response rates to the machine generated emails are very similar. There’s maybe 557 percentage points difference in response rate, but at some point, there will be the same
32:23
one so much faster. For me to do that research, it would take me, I don’t know, five to 10 minutes. The
32:30
hard part now, though, is you have to remember that context coming into the meeting somehow, using the
32:36
email thread, remember, because this is yeah, that’s right. It
32:39
is an email thread. But if the machine does more and more, what I find is, I remember less than less than so I have to go ahead read and recontextualize.
32:47
This is part of my issue with like, I take notes on all the meetings I have with startups. And now I’ve got circle back and other tools that are taking these incredible notes for me and summarising them. But I don’t remember the conversations as well as when I took the notes. Totally
33:03
interesting. Yeah. There’s something to it. So
33:07
Tom, how is your investment approach been impacted by the overall sentiment from enterprise buyers to consolidate their tools and try and reduce costs.
33:16
So this has been the dominant narrative within organisations, especially when rates when the Fed increased rates from effectively zero to 8500 basis points. I think the I’ll give you an example. We were talking to one enterprise buyer and she said do not sell me another tool. That was the first thing she said on the call, we were just interviewing her about how she’s thinking about budgets, I do not need another tool. And I think that’s broadly true. People feel that way. And there’s a desire to consolidate and there’s a desire to see more value. We saw that in the Palo Alto most recent earnings were the most enterprise customers are saying I’ve bought the 15 to the 16 security solution, but I’m not actually more secure than I was in the past. And so the there’s incumbent upon the salesperson, I think the challenge to convincing them for a new budget line item, or the marginal SAS offer is much higher than it was three or four or five years ago. And what that means is there’s been people buy software, really for two reasons. One is to increase revenue. The second is to reduce cost. And during this zero interest rate environment, the pitches became you could pitch something further and further away with a less than less tangible connection to one of those two value propositions and now it’s back to basics. It’s really how are you increasing my revenue or how you meaningfully reducing my cost? I mean, we had this, we were on a call yesterday and one of my partners Andy and I, we were talking to a head of engineering at a very large company, and we were asking him about code auto completion. And we asked him a question around latency, how important is it that the code auto completion response On in two seconds as opposed to five seconds. And I think the dominant narrative within the valley is that matters. Those three seconds for Google, I mean, Google 100, millisecond latency on the click really matters. So a three second difference? And his answer was, look, if it takes three more seconds for a printer to print out a piece of paper, am I buying a new printer every year, in order because the sum of all those little epsilon is in terms of time ultimately aggregate to some huge number of hours, and then you multiply it by the hourly rate of the average engineers know, then they can wait. And so I think there’s like that that has changed where it’s not about squeezing as much juice from the lemon as possible in maximising productivity. It’s more about big blocks, it has to be blatantly obvious and completely transparent at the beginning of the pitch that this really will move, revenue or reduce costs and a pretty meaningful way.
35:57
How do you think about whether a new workflow or capability should be built into an existing platform? Versus standalone? is
36:09
a great question. I mean, Salesforce 20 years ago, 25 years ago, they perfected this playbook of starting with one particular workflow inside of a CRM, which was salesforce automation. And they went after Siebel, and Oracle. And after that, they expanded now it’s worth 250 billion. And if you, if we review almost every single business, since then it started a software company, they focus on a single workflow made it brilliant. And then if they were able to get to scale, they tried to broaden out and some of the more successfully than others. rippling would be a great counter example of a company that was started as a compound startup from day one, but just raise a lot of money, hire individual product managers to build out I think, last time I looked at 27, or 28, different products that were all integrated, there is a desire within the buyer ecosystem to find products that look like that, because of these cost dynamics. On the other hand, it is exceptionally difficult to do both because of the capital requirements that are needed. And then also a strategic vision of a CEO to say this is the way that I want to build a business and to have that level of market acceptance. So I think if a buyer had a choice, they would take the suite or the compound strategy. But I think in reality, the Salesforce path, the market will likely to be the vast majority of startup efforts, because much faster time to feedback, less capital requirements. And it’s just easier in a certain sense.
37:36
Tom, how do you think AI will impact GDP?
37:40
So I think so the US GDP is growing faster than anybody expected. I saw a statistic two days ago that the US GDP is far outpacing any other g7 economy. I know this is gonna sound ridiculous. I think a bunch of it has to do with AI already. I think so. I mean, you think about like the time compression, that some of these AI tools are looking at the legal profession, look at the research for sales development reps, I was chatting with the founder of a publicly traded CRM company. And I was asking him, what do you think will happen with AES and STRS over the next couple of years, and he said, and this is an extreme view, but he says some significant fraction of VDRs in STRS, will no longer have a job because of the automations that we talked about, right? Because of the research. And so there’ll become junior A’s as opposed to becoming STRS. And so it’s probably too aggressive to say that it’s having a pretty meaningful impact on GDP. But I believe I’m optimistic and I think you’ll see the US grow significantly faster than everybody expects, and see materially better profitability. For a lot of these companies, Microsoft is reporting 75% improvement in productivity from their sales, their software engineer ServiceNow, saying 50% Klarna, cut two thirds of their headcount in customer support. And so I think you’ll see this, like one time, cascade of profitability over the next couple of years, and lots of these companies and the productivity per person, which has basically been flat for 15 to 20 years, will have a huge surge. So I’m really bullish.
39:09
I mean, is it reasonable to say over the past few decades, software has been one of the primary causes and drivers of GDP growth. And now this is kind of the next extension that could even make the slope of that line more severe.
39:25
I think that’s right. I went back and I look maybe six months ago at the impact of the personal computer on GDP growth, and it was relatively trivial. And so as most of software and I think a lot of us have been selling software that we contend, should meaningfully improve productivity. And but I think this wave will actually fulfil that promise in a way that previous generations of software have. I mean, I can instead of listening to a podcast, I can put it into an LLM and in 30 seconds have a summary of it saves me an hour right? Same for the YouTube video. And instead of whatever drafting some long document, I can have the LLM started. And it’s far easier to edit it than it is to start with a blank sheet of paper. And so I think there are enough of those examples where we’re saving 3040 50% of our time that we should see in the GDP, we should see better growth and more efficient growth and sales people hitting 5x their quota in a way that like just wasn’t possible two or three years ago. I
40:26
agree with that. But I refuse to believe that software hasn’t had a meaningful impact on GDP. I mean, I’m not sure how the analysis was done. But I mean, the efficiency outside of AI efficiency that one can get, and the data access the flow of information, the ability to transact in a healthier way with less local ARB opportunities. And I don’t know, the world has become more flat. And I feel like it’s been a rising tide for all countries.
40:58
I think it’s very fair, I think there’s no doubt that it has had. And I need to go back and look at the data. But I was surprised it wasn’t like a whole percentage point a year. Do you know what I mean? It should be US GDP is worth 2423 24 trillion, like adding 200 billion in GDP a year as a result of software? I think we’ll see something like that from Ai. Maybe it’s like 20 billion a year from software, something like that. What
41:25
are your thoughts on Reddit selling their data, and how many internet business models are going to transform going forward?
41:31
So I think we could see and this is low probability guessing. But I think there’s a probability that the business model the Internet changes today, mostly the internet’s commercialised through ads. And, and I spent some time building up helping build out some of the systems that Google the core underlying technology that powers us is a cookie, and the cookie is going away, Facebook, lost $10 billion in revenue a quarter because Apple removed the mobile equivalent cookie called the IDFA. And it changed the way that people can target and third party cookies also going away, the ultimate performance of the ad market will go away. And so there’s this pressure, because the publishers, the people who produce websites are making less and less money as a result of more and more privacy within the ad ecosystem. At the same time, these large language models require more and more data. So longer three to 8 billion parameter trained on 15 trillion tokens. And we were talking about in internally, one of the questions we asked is, How many tokens are there in the world, how many different words or subsets of words are there. And it’s got to be something like 15 to 20, because we think Google is around 20. So in order to improve these models, Google is paying RedHat. And in order to access their data, 60 million a year, we could say is that a lot of money or a little, I’d say it’s probably under paid. And Adobe is paying $3 per minute a video footage in order to train their models to be able to compete. So there’ll be this huge business development effort, which is all of these businesses that have proprietary data, or data that’s really recent, or data of a very particular kind. They’ll sell them and forever, right now, it’s about 678 percent of their revenue, but it’s really high margin, right, it’s 100% margin, or there abouts. And so I think we might see a change where the publisher has a far more predictable revenue source selling their data. The publisher then specialises in a particular kind of content then approaches Google and says, I see your model is weak in the following areas, we built this product feature that will produce a lot of data to solve this challenge that you have with the machine learning model. And then the big publishers are already spending a billion dollars a run on training, that it’s really valuable to have the most accurate search engine or LLN based search. And so they’ll pay a premium for unique data or very recent data. And then the other dynamic that you have is people selling products want their information to be accurate in the LLM. So if you have a large language model, that search product, that let’s say there’s a next generation soft drink, and all the reviews are negative, but they’re misinformed. The soft drink maker wants to influence those results, they might pay for the inclusion of certain kinds of data within let’s say, the context window and say a cure some nutrition facts about this particular kind of soda, balance that against the negative uses Reddit. And so you might have this completely different. Instead of ads, the marketers of the products are paying to inject content within the context window. And instead of ads being run on Reddit, Google was paying Reddit just for the data. And if that happens, then you have a pretty different internet experience. The last point I’ll make on this is five years ago, we were talking to Ben crazy to pay for search as a consumer the way that we pay for Netflix. But today there are hundreds of 1000s if not millions of people paying for chat GPT or Claude or Gemini. And we pay $20 a month for it because we want to better at Experience, imagine if it looks like television. And right yeah, unbundling of television, you buy AMC because you really like British police procedurals, like I do, or you buy access to Bloomberg, because you really care about financial information or NFL Sunday Ticket matter the same thing in the search experience, where pay $20 a month for search. And then I really care about financial information. So I by The Wall Street Journal upgrade, and then I buy at the Bloomberg upgrade. And that premium data is now fed into my version of my LLM. And now all of a sudden, I’m paying 5060 $70, maybe 200 US dollars a month for an enterprise for a very particular kind of search. And there’s still a free experience, which doesn’t have that data, but there’s this premium search experience. So maybe that’s where this all goes, I still kind of a twinkle in our eyes, so to speak, but you can see it, if
45:48
the experience is that much better. And people are already paying for it. I mean, we had Adrian own from forward on the podcast, and he was talking about like a 10x, better healthcare experience. And his analogues are and do you pay for Spotify? Or do you listen to old radio? Do you pay for Netflix? Or do you use an antenna for free? Broadcast? I mean, if the experience is 10x, better people will pay for it. It calls into question, though, what is the future for publishers? Right? Like, will I have a website? Or will I just have like a effectively a database where I put like the appropriate information that gets called, like, in an effective way, so that my and then I get paid by Chet GPT. Every time it’s requested? I don’t know. Yeah,
46:34
I mean, we’re substack is sort of that way already, right? Where we aggregate content. And Google in a certain sense is that way, some publishers will have either uniqueness of content or an existing audience large enough to be able to build directly, but there’s probably greater it’s like, public access television. versus Netflix, right? It’s possible, but I think it might be harder and harder.
46:56
Tom, if we can feature anyone here on the show? Who do you think we should interview? And what topic would you like to hear them speak about?
47:04
Boy, this is such a phenomenal question. I think I would really like I think, Lenny Richard ski. I think Lenny. And I think he has built a marvellous business for content marketing, particularly focused on one initial segment. And as we think about what the future of large language models means for publishers picking up on that part of the conversation, he would have some brilliant insights. And in addition, for all the product people in the audience, his depth of understanding that particular discipline and domain is unparalleled within the ecosystem.
47:38
Amazing, Tom, give us a book, article or video that you would recommend to listeners.
47:43
I recently read a book that was called the new map, which is written by David Yergin, who won the Pulitzer for another book about the history of oil. And the new map is a book that talks about how the US becoming a net exporter of energy, both natural gas and oil, has completely changed geopolitics. And this is a book that’s published in 2018 and 2019. It pre stages that Ukraine conflict. And it’s just a brilliant summary of how the world will be different in the next 10 years as a result of that one change that we no longer rely on external sources of energy.
48:23
Amazing. I want to plug it in the chat GPT and get the Tom, do you have any habits, tactics or techniques that are a secret weapon?
48:34
Persistence? I think I was a rower in college and our freshman year coach taught us to love the sport. And he taught us to love it, because it just requires a lot of repetition. But if you start with a love of something, you will continue and continue. And so just repetition. How did he teach the love, he put us on the water. So he put us in a boat, he took us to beautiful places, he built camaraderie in the boathouse, and he loved it. And so that I mean, it’s when someone like one of my family members really loves classic rock. And he spent two or three years educating me about classic rock, I knew nothing about it. And now I can tell you, my favourite guitar solo happens to be Prince at the 2020 12 induction for the Hall of Fame. And so like, if somebody shows you what they see, and why they love something, gonna work every time but his name is Joe and Joe really, like It spoke to me and, and I take a lot from that experience, really, that sort of long term commitment. And there’s this great Michael Phelps ad, which I’ll never forget, but it was in the Olympics, which was what you do in the dark determines what you do in the light. In other words, what you do in your basement working really hard it is ultimately how do you get to be where you are and no one for which that’s more true than Phelps?
49:54
Good stuff. And then finally here, Tom, what’s the best way for listeners to connect with you and follow along with theory? Yeah,
50:00
there’s a blog attempting news.com or follow us on LinkedIn where we publish all of our content there. And feel free to reach out through LinkedIn messaging. Were quite responsive. Very good.
50:10
And he is tom tom goos. And the firm is theory ventures. Tom, thanks so much. It
50:15
was a pleasure. Pleasure was online, Nick. Thanks again for having me.
50:18
All right, sir. Thank you.
50:25
All right, that’ll wrap up today’s interview. If you enjoyed the episode or a previous one, let the guests know about it. Share your thoughts on social or shoot them an email, let them know what particularly resonated with you. I can’t tell you how much I appreciate that. Some of the smartest folks in venture are willing to take the time and share their insights with us. If you feel the same, a compliment goes a long way. Okay, that’s a wrap for today. Until next time, remember to over prepare, choose carefully and invest confidently thanks so much for listening