A product leader's guide to AI, part 1: Finding the right problems to solve
I’ve migrated to Substack. Read my first AI article on where ChatGPT came from here, and my article on the question of single player dominance vs. generative AI as a new technological platform here.
It is very hard to be practical in the middle of a revolution. Everyone seems to be running around, proclamations for world-altering changes are all around you, and it can be challenging to separate what’s real and valuable today versus what may or may not arrive in a year, or more. Those of us who have worked in AI for a while might see the explosion of capabilities in LLM’s as a logical extension of the trajectory of the technology over the last few years, which is not to underplay the significance of the last six months. But the hype cycle means there is a push for everyone to have an opinion on what to do and what comes next.
This is hard enough for companies like Google that are expected to win in this market, or the scores of AI-focused startups that are suddenly flush with venture money. But let’s say you’re an executive at Procter & Gamble, or Citibank, or Coca-Cola, or Caterpillar–basically any firm in the 90% of the US economy that isn’t explicitly categorized as “tech”. It is natural to be cautious. The last big technology fad was crypto, and in retrospect everyone trying to create their own NFT’s seems kind of silly, or at least premature. Should all business leaders have a plan for AI too?
Yes. Putting aside the claims that AI will transform the world in some uncertain future, if all AI advancement ceased today, what has already been developed is transformative. And that isn’t even limited to generative AI (GPT-4, Midjourney v5, Claude, Github Copilot, etc.), which has been getting all the buzz recently. Generative AI is merely a specific application of the larger AI space, the capabilities of which most companies have only scratched the surface of.
I’ve been working with industry leaders to help them understand what this technology is and how to think about folding it into what they do. So here is the first of a series of articles on how to think about building up an AI function in your organization. The first topic we’ll address is how to identify the right kind of problems. We’ll break this down into classical machine learning and then discuss generative AI.
An important note: I’m assuming that when it comes to “traditional AI”, your company will create its own models to solve a specific business challenge, so the rules I lay out are focused on the sort of general problems AI is especially well-suited to solve. In the “generative AI” section, the more likely approach is the application or training of an existing model (such as an LLM), so we’ll discuss when these human-emulating agents are going to help you, or just cause a headache.
Traditional AI and machine learning: Optimization problems and identifying patterns
Historically, AI has existed behind the scenes, invisible to a company’s clients or users, its advantages manifested in improved operating metrics. Despite having similar architecture to ChatGPT-style AI, the more common term of art was “machine learning” (ML), a fuzzy distinction but nevertheless useful, as the majority of “AI” applications were not, until recently, trying to emulate human intelligence. Instead, they were trying to solve seemingly intractable optimization problems.
I spent five years working on the machine learning models that power the Facebook News Feed. Until ChatGPT came along Facebook was probably the largest and most complex application of ML or AI in the world, though you’d forgive the average user for being ignorant of this fact.
The Facebook ML models simply control what posts you see, and in what order you see them. The scale of how this seemingly mundane problem is solved is staggering. Simply serving a single user with a list of posts–some from friends or pages the user follows, some the algorithm just thinks they might like–involves a massive triangulation of data points on what Facebook knows about the habits and affinities of the user, the authors of the content, and the entire Facebook ecosystem. You can probably imagine an easier way to solve this choosing-and-sorting problem that doesn’t involve ungodly amounts of computing power, but it turns out that the ML models do a much, much better job of it, at least as measured by metrics that Facebook cares about, such as how often the user engages with the content presented to them. In fact, once these models were introduced, the numbers were so good that the ML-powered Feed became Facebook’s secret weapon, and honed the app into an attention-driving machine. And as businesses and publishers came to rely on Facebook for their content distribution, any change to the algorithm–by, say, giving less airtime to “hard news” stories that tended to bum people out and bring down metrics–could upend the digital media industry.
Even if you’re not creating the world’s biggest distribution service, the Facebook example helps illustrate common characteristics shared by problems that are good candidates for machine learning:
1/ First, and perhaps most obviously, there is a whole lot of data. Lack of data is not usually a problem these days. Google has indexed hundreds of billions of web pages. Netflix streams more than 140 million hours of video each day. Genomics research is projected to generate over a quintillion bytes of data within the next decade. It’s not just technology players–in a survey of companies across sectors, the mean number of data sources per organization was 400.
The data may or may not come from within the company building the AI product. Broadly speaking, you can categorize data into data created by an organization and outside data leveraged to create new capabilities. Netflix can create a recommendation system based purely on its own users’ behavior, so it doesn’t need to look beyond what it tracks every day. But when Deepgram wanted to develop its speech recognition tech, it needed gigabytes of hand-transcribed audio to train its models, and it had to source them from elsewhere.
2/ There are no explicit programming rules. This is a subtle point, so let’s start with an example. Some games have no known algorithmic solution, meaning there is no clearly defined strategy that will guarantee a win. There is no winning formula for Go or Chess, for instance, while games like Tic-Tac-Toe and, surprisingly, Connect Four have sure-fire strategies (in the case of Tic-Tac-Toe, to force a tie). Building a computer program that could reliably beat the best Chess players in the world required the invention of deep neural networks and reinforcement learning–that is, allowing a machine learning model to play itself over and over again to build its understanding of likely outcomes given the set of all possible moves.
It’s not to say that you can’t try to come up with an algorithm to do a reasonably good job in place of AI. You could brainstorm a simple set of rules that could power Netflix recommendations (e.g. “present thumbnails for the top 10 most watched shows in the genre the user has spent the most hours watching”), it’s just that machine learning models’ high degree of complexity tend to produce superior results.
This gets to a corollary rule of good ML problems: Relationships between variables aren’t self-evident. ML models have a tendency to uncover things that elude human researchers. For instance, Princeton’s DeepSEA learning model predicted a link between certain DNA sequences associated with specific cell types, leading to the discovery of new connections between genetic variations and diseases like lupus and celiac disease.
4/ Success is easy to measure. In the examples we’ve talked about so far, when a machine learning model is performing well, it’s pretty obvious. You run an A/B test with and without the ML model running the show, and in the ML-driven case your key metrics go up: a user spends more time on your platform, conversions increase, costs are reduced, etc. But here is an important break between what we’ve called “traditional AI” vs. generative AI tools like ChatGPT: How “good” you find a ChatGPT response to be is fairly subjective, assuming it passes a minimum bar of readability and relevance. In that case, all you have to go on is whether your users seem to respond positively, which is only part of the story.
ChatGPT, Midjourney and their ilk: Generating human-like content
Strictly speaking, we should treat “generating human-like content” as a specific application of “identifying patterns'', since that is fundamentally what chatbots like ChatGPT are doing, even if that feels overly reductive. But this is a framework for thinking about use cases, and for most people, ChatGPT writing a poem about traffic lights feels pretty different from Netflix recommendations, even if the underlying architecture is surprisingly similar.
Now is not the time to weigh in on the debates on AI in the classroom, or in TV writers’ rooms, or whether we should make a law that all AI-generated content identify itself as such. Let’s simply lay out two requirements for generative AI to be effective:
1/ The output isn’t rocket science, but a lot of human time is wasted in creating it. Consider customer service, which has been experimenting with AI chatbots for a while, though now it seems like they may finally become useful (there are a number of startups like Ada whose whole business is AI-driven customer interactions). Customer service agents answer the same questions repeatedly, and the set of facts they need at their fingertips is small (relatively speaking) and usually documented in a centralized place–perfect for training a language model.
But beyond service jobs, which are assumed to be fairly mechanical, it turns out a lot of white-collar professions involve grunt work that can be automated. Ask a lawyer how much time they spend on repetitive tasks. Or a data scientist on running routine analysis. Even college professors are thinking they should just get ChatGPT to write their student recommendation letters for them.
I think the headline is that these workers–both on the service and professional side–are going to be more productive with AI, rather than made obsolete. Claims that a new technology will replace humans is an old fear, but there are a lot of counterexamples where new tech ended up as a net good. There were more bank tellers after the widespread rollout of ATM’s, because banks expanded what they offered and their employees were able to focus on services that required a human touch. My former professor Erik Brynjolfsson found that call center operators became 14% more productive when they used AI as a partner, with gains of over 30% for the least experienced workers.
For another example, let’s talk about software developers. It turns out ChatGPT can churn out fairly good code based on plain language input. The real skill, hard to replicate with AI, is the inspiration, the idea, or the architecture, but the execution–the actual code writing–is a great candidate for automation. This is why, of all areas that generative AI seems poised to disrupt, software developers seem the most excited about being disrupted.
2/ Potential errors are low-risk or easily caught. We may someday get to the point where AI can write perfect code, execute it, and cut the engineer out of the process entirely. But we’re not there yet, and maybe there’s no hurry to get there. Even if AI-written code isn’t perfect (and it rarely is on the first go), the good news is it takes very little time to verify its correctness–just run the code and see what happens, and then fix (or get the AI to fix) the errors that come up.
You can imagine other functions where there is a buffer between the AI and the end product in the form of an actual human–copywriting, marketing campaigns, education, and so on. If skilled employees can outsource monotonous work, freeing them up to focus on work that is more interesting and intellectually-challenging, you have an environment where AI adoption feels like an accelerant, as opposed to a robot takeover.
Given AI’s known biases and tendency to hallucinate, it remains too risky in most cases to entrust these generative models with too much of your reputation or revenue potential. There are AI startups working on narrow use cases for AI that purposely limit what types of questions it can answer or content it can create, which may help minimize risk.
The best approach is to treat AI as a tool rather than a surrogate employee. Maybe I’ll eventually be proved too conservative in my thinking, and we really are on the cusp of a fundamental disruption of work and life. But in the meantime, build new models where you can, integrate generative AI where it makes sense, but be sure to ask questions and question answers.