Episode 3

Do Soon Kim
Navigating the convergence of tech and pharma

Listen on:

In this episode

Do Soon Kim is Head of Product at Xaira Therapeutics. He brings a unique perspective at the intersection of AI and biology, with roles spanning AI product development at AWS and drug discovery at Eli Lilly and other biotechs.

In this episode of Models & Molecules, Do Soon Kim discusses his journey in the biotech and AI sectors, emphasizing the importance of effective communication and teaching in fostering collaboration between different scientific disciplines. The conversation explores the potential of machine learning to reshape our understanding of intelligence, moving beyond the limits of human intuition to decode complex biology.

Key takeaways

Full interview transcript

Nicola: Hey, Do Soon, it’s great to have you today here at the show. And I think that you are now building Xaira, which I think is one of the most hot and interesting AI-first companies out there. But before that, you also work in top labs, early startups, and also in larger corporations, in different roles. Is there, let’s say, an underlying curiosity that pushed you forward into this path?

Do Soon Kim:

I have many underlying curiosities.  I don’t think there’s one that I can point to. I’ve done lots of different things at lots of different places. But I think the two common themes that I’ve always chased are if I had spent my career and I’ve somehow enabled better therapeutics to come to the market and for people to have improved health because of my work, I’ll feel very fulfilled. I’ve found different angles to try to improve what I consider healthcare life sciences, but that’s been like one underlying theme.

The second, in terms of like what you would consider a top lab or a big-name company, what I really focus on is “who are the people I want to surround myself with?” I believe there are good people in both startups and big companies. And I believe there are great people in both recognizable places vs. not. At this point in my career, I care way less about the name on the resume and much more about who is like my small group of people that I’m working with, and are they all better than I am.

So I typically… a phrase I use a lot in every job is “can you explain this to me like I’m a golden retriever?” or “explain to me like I’m five”. Because I think the mark of a true expert is the ability to explain a very complex concept at very different levels of granularity and to do it very well.

So I’ve sort of prided myself on being the dumb one in the room, which maybe is why I’ve gotten to take tours at well-known companies and labs. But it’s really been about surrounding yourself with the best people you can find yourself around and then also contributing to this larger mission of improving human health.

And then the other flip side of how I’ve been able to be here is just a healthy dose of luck. And then just working really hard when luck comes my way. I would be remiss if I didn’t mention the great mentorship and guidance that I received in having been where I am.

Nicola: I can totally relate to that because I think that getting surrounded by the right people – well, there is a certain definition, I think, for each of us of what “the right people” is or means – is critical for everything: for both the professional life and also the personal life. Because of course, this becomes at some point also, you know, it’s a big part of your motivation to move forward and to do meaningful things in your life. You need, at the same time, the right support and the right people that push you forward. I totally see that.

If we look at the content, so basically, you’re saying “I feel the motivation to bring new therapies to patients, to people that need them”. This is another part of really what in a way your career, let’s say, in this respect, and how you see yourself moving forward. How are you trying to actually achieve that right now? What do you think is innovative about what you’re doing right now in respect to that mission?

Do Soon Kim:

So I think there are two aspects to it. The first is like what I do, and the second is where I’m doing it. The where I’m doing it is the easier part. I just go to the highest density talent team that I can find with very kind people. I’m very grateful to have landed at Xaira, where I believe like there are some really brilliant people building on some excellent work. And there’s also a lot of like, I call them grown-ups that I can learn from. So, I’m very fortunate to be here.

So currently, I am a product manager at Xaira, I lead Product. And I think to be a product person in a therapeutic / AI company is basically an existential crisis in resume form because it is really unclear what a product manager should do at these companies, because these companies just haven’t existed for all that long. If you’re a product manager at… like I was previously a product manager at AWS, if you’re a product manager at Apple, Google, those roles have existed for long enough such that it’s very clear like… very clear examples of people who have been in those positions and have excelled. They’ve done it in different ways, and it’s not so cut and dry, but there are at least lots of examples to pull from.

The world where I live and the chaos I decided to jump into is like a blend between cutting-edge advancements in tech and biology. You sort of want to borrow things that worked well from tech and bring it to this incredibly messy world of biology, while at the same time sort of convincing the biologist that a PM, a product manager is actually a useful person. What I am doing has changed a lot based on what the company needs at that time.

I think one key skill of product managers is the ability to connect the dots. So, what I really focus on is I sort of accept and sleep well at night that basically everyone at the company is, as an individual contributor on a technical level, much better than I am. And instead of being like, I’m not a technical contributor, I think, wow, how lucky I am to be surrounded by these brilliant technical staff. But what is the dot that I connect for them that’ll make them much more effective?

So, I’ve decided that I really enjoy connecting dots between sort of disparate ideas. I try to bring that. And a nice thing about having an ambiguous and maybe poorly defined title is that at any given moment, with the support of a very good manager, of course, you can go work on the highest impact thing that you think is facing the company.

And I think I’ve enjoyed my time in startups for that reason, is that what you do is really not defined by your title at all. What you do is sort of measured by what impact you have on the bottom line of the company.

I’ve worn lots of hats, and I think product manager is just ambiguous enough that I can continue wearing lots of hats, especially at a startup.

Nicola: Yeah. And I find it very interesting, because I think exactly as you say, right? So, for a biotech company, the role of the product manager is not really a common role at all. It’s definitely a role of a tech company, much more common or actually much more needed, that used to be there.

And that brings me also to another topic, which is something that I’ve seen and probably you, living in that reality as well, you’ve seen it as well — that tech and pharma are sort of converging. Or, let’s say, in a sort of collision route. They’re not two parallel lines. They’re definitely getting there. And I think there is enough pull from one side as there is from the other side. There is definitely an interest from tech to join the pharma and the opportunities that are out there. And on the other end, a need at this point of pharma to join the tech side of things. And I think the fact that now in a biotech company, a role like yours exists is I think a true embodiment of that necessity. So how do you see that?

Do Soon Kim:

When I think about tech companies and think about pharma companies, when I think about any company, actually, I think about how you generate revenue. A technology company will typically generate revenue selling ads or data. They don’t actually, you know, sell… I guess there are some companies that sell access to APIs that are doing some complex technology.

But really, if you boil a company down to what they will be measured on their performance, it’s sort of always not the underlying technology, it’s sort of the output and what people will pay for. If you think about companies in that lens of like, what is going to be a revenue generation model, then I sort of don’t worry about tech companies wanting to merge with biotech companies.

When Google decides they want to get into the life science business, they spin out companies like Verily and Calico, right? They do this in a way, but really… this was one of my key reasons for joining Xaira… you want to be proud of what you’re measured on. And a biotech company is measuring the therapies that you can bring to market and the number of lives that it can impact, right?

From a revenue perspective, I have no problem saying that a biotech company is where I want to be. If you go one step deeper into how the company operates, biology typically has been a field that have been late adopters to what we consider a lot of the modern tech stack because it was a field that was mostly driven by human intuition and human ingenuity, actually, and not necessarily efficiency gains that modern technology has allowed you to get.

They are obviously adopters of technology when it makes obvious sense, and it’s been de-risked by lots of people. But, generally speaking, I think on average, they’ve been late adopters to technology.

And I think the thing that is accelerating that adoption, tech adoption and increasing their risk tolerance is machine learning.

And I think it’s doing that in two ways. One is, like I mentioned, pharma and biotech and therapeutics in general is a field that has thrived on human ingenuity and human intuition. People see science as a field where machine learning can have a pretty outsized impact because we think the insights and ability of these really large models will surpass human intuition very, very quickly.

Nicola: In a way, you are saying, if I understand correctly that, so of course, right now there are lots of companies that build models to improve molecules, all of that. But I think you’re drawing a picture a bit of a higher level than that and saying that perhaps machine learning is going to help us to understand biology better, beyond what human intuition can grasp. Is this what you’re thinking about?

Do Soon Kim:

In this final form or in some very near future, machine learning is going to reshape how we think about intelligence. Intelligence for a very long time really originated with the human, like it started from human intuition, right? And whether the human decides to take the neural networks in their brain and write a novel or solve a mathematical proof or conduct an experiment, in all those like showcases … like write music, perform things, in all those cases… Machine learning is moving at a pace such that it is possible to imagine, sort of reimagining, re-conceptualizing our view of intelligence and really what we see as… like what we perceive as intelligence in a big picture.

And I think the reason to go back to my first answer of why I think pharma has been relatively quick to jump into this, even though the technology really hasn’t been de-risked yet, is I think, at least in my lifetime, the last time we saw technology moving this fast was probably the internet and web development. Except the internet reshaped how information is shared. And eventually technology worked on how information is searched and retrieved through indexing and search. But I think machine learning is moving at a pace that is so fundamental to how, like I said, how intelligent or like innovative work gets done that there’s a strong sentiment that if you do not start adopting it now as a part of your core platform, there won’t be this like way to catch up in the future.

It’s not like you pass paper notes and then you decide email is better. Whether you adopt email today or tomorrow or maybe next year, email has, you know, it’s been more of same thing. It’s easier to write more emails now, and it’s easier to find emails and stuff like that. But email is like, it’s there, it’s gonna be there. It’s not gonna change all that much.

And I think most technology has this like rapid ascent and then some stagnation once it’s become like good enough for everyday use. I think machine learning is maybe one of those fields where it is really unclear whether it will ever stagnate.

Nicola: I love what you’re saying now because I think one core message that I hear you saying is that indeed pharma, but I guess also other type of industry, but you know, knowing pharma, that’s also what I hear from talking with people is that there is a need to own that technology at the core of the business. So, it’s not a side, it’s not indeed as an email, right? So, it’s not sort of little things on the side that you can eventually jump on it if you discover that you need it, you need to embrace it, especially you need to embrace it earlier on if you want to leverage that exponential curve that that technology can allow.

And then it comes back to why there might be that exponential acceleration. And then we go back to what you were saying about the fact that there is something at the core of it, which we really don’t understand maybe fully yet in terms of where it’s going to be. But there is a different way to think about intelligence.

We have been thinking about intelligence in terms of humans. That’s a way to think about intelligence. There is a quote from Dijkstra, a famous computer scientist, a Dutch one, who won the Turing Award: “Asking if a machine can think is the same as asking whether a submarine can swim”. I love it in a way, because it’s a bit philosophical, but that’s the point, right? You can achieve certain things and perhaps a machine can think about those things in a way that we cannot, that help us advancing in a way that otherwise alone we would not have been able to do it just with our own intuition maybe. So, I think I totally embrace what you’re saying there.

You were saying that in your role of product manager, you were mentioning that a lot of it is connecting dots, perhaps linking between what a biologist says and what a machine learning says but can you, if it’s possible, bring us with an example of what that means in, you know, day by day.

Do Soon Kim:

The problem that I’m focused on solving as a product manager is to make sure that we measure ourselves in the right way. There’s not enough publicly available data for machine learning models, regardless of some amount of scaling. The models get bigger. But I don’t think there are enough data out there for us to achieve clinical success from a molecule that comes out of a computer chip today.

So, you can live in two camps. The first camp is machine learning models without any real-world data relabeling will one day be able to achieve just zero-shot clinical candidates. The other one is all the data in the world. Unless we do something special, we’re not going to make that fundamental breakthrough to like clinical candidates out of a computer chip.

So, I’m in the second camp where I think we’re not there, obviously, and I don’t think we’ll get there with publicly available data. Obviously, the variable there is like someone could end up releasing a massive public data set that changes this, that’d actually be great for humanity. Some people have to sort of think about what that means for business models, but that would be great overall for humanity.

So being in that camp, that means that clinical success, which is, like if you’re a therapeutics company, that’s sort of how you should measure yourself by. Clinical success is going to depend on successful translation of computer outputs, machine learning outputs, into the wet lab. And conversely, successful translation of wet lab results back into the models that you are using to generate these molecules.

There’s a lot of hows in how people do that. You can think about, is this pre-training? Is this reinforcement learning? Is this post-training? There’s lots of hows. But I think that matters way less than recognizing the importance of the interface between the dry and the wet lab. And the reason I think this is a problem that has not yet been really faced with machine learning is that I’ll use large language models as an example.

Between GPT and GPT-4, the goal of the models was to sound as good as a human. The gold standard in large language models for a long time, image recognition as well, was the performance of a human for that given task. So, when GPT spits out a sentence, does it sound like something a human would say? Once we got past being human-like and achieving parity with humanness, we then moved on to, how accurate is the information? Is it able to recall? But you could probably find a human that was as knowledgeable about GPT on some given topic, if not more knowledgeable. So, it was very clear that the advantage was the vast amount of information, the variety of questions it could answer. But the performance was still like, the gold standard was still human.

Outside of LLMs and other aspects of machine learning, beating humans became very, very clear from very early days, like in chess, right? So, you have models that can beat humans in chess. AlphaGo beat Lee Sedol in Go. In the world of therapeutics, what we like machine learning to do is actually exceed human performance, design molecules that humans, given reasonable amounts of time, would not be able to find and design. Because we think that is going to lead to therapeutic abundance.

So, the goalpost is actually quite different in the world of therapeutics and biology in general, because your gold standard is no longer the human. Your gold standard is the wonderful but painful complexity of the natural laws of like physics, chemistry, and biology. So, because of that, your success depends so much on being able to very well translate the wet and the dry labs.

Conversion on those two is really critically important. So my job is focusing on making sure that importance is on top of everyone’s mind, and the dream scenario is that people don’t see themselves as like a wet lab scientist or a dry lab scientist. The dream scenario is that we are sort of pioneering a new way of doing science.

What does that mean from day to day? Like, I mean, lots of taking notes, writing down ideas, talking to people, learning. But the one activity that I think I do much more than maybe other product managers that are in a more narrow field is actually teaching. I see this day-to-day activity of teaching the two teams that speak very different languages to understand and appreciate the other side as one of maybe the most important thing I do from an organizational standpoint.

Nicola: That’s super interesting, actually, I think, because I heard before, you know, that people, especially in your position… there is a need to translate to the wet lab, to the dry lab. There is a need for these two teams to speak the same languages. But I think that’s the first time I think in which I see that the role is also about teaching that, which I do believe is very important because how do these people get to know the other language? How do they do that? And I think what you’re saying there, I think to me, it makes a lot of sense that the person in the middle, the person that connects the dots is also the person that is teaching to the different group how to do that connection, how to speak with each other.

I think that that’s critical. So, I totally see how that is a very interesting aspect of your role. And in that respect, I can imagine that you know, in doing that, in doing that teaching and in doing that translation, sometimes things go wrong. What do you think are the mistakes? Is there a mistake in doing that type of job that come to mind as that’s what you shouldn’t do?

Do Soon Kim: Gosh, I make a lot of mistakes.

Nicola: Not necessarily you, right? But in general.

Do Soon Kim: Well, like anyone in this job. Yeah, yeah. I am, luckily because of, you know, who I surrounded myself with and who I got to learn from. And I think this applies to maybe teaching more broadly too, but like I grew up bilingual in that sense of both like keeping up to machine learning research as well as synthetic biology research.

So when I go into teaching mode, the most dangerous mistake I think you can make is assuming that the person you’re about to teach to already understands the key fundamental differences between the two fields. And the same word used in the different fields can have vastly different meanings. And you have to start with definitions, actually.

Any machine learning researcher understands what an eval should do and like, yeah, it’s really easy. You don’t have to explain to them at all. When you describe evals to a biologist, they’re like, yeah of course, that’s so obvious. And then, I mean, it’s sort of like describing learning to a biologist. They’re like, yeah, of course you learn from data. What else would you be doing?

Because I think in biology, the human is so closely tied in the loop and we underestimate how good the human is at doing all those things very implicitly. That in machine learning, in any sort of computer programming, I think the key difference between like a tight human in the loop and something when you program a machine is you learn to understand the importance of being explicit. What you even mean by explicit means very different things in computer science and biology.

Using words without really being empathetic to like how they have used that word in the past, but instead you just sticking to your, how you used it in the most recent meeting is I think often where you lose people very, very soon in that like teaching moment. And after that, they have to sort of dig themselves out of this like vocabulary before they can actually even get to thinking about the concept behind it. So I would say the biggest mistake is like overassuming that words have the same meaning in both fields and not defining those very crisply when going into it.

Nicola: I relate to that, right? Sometimes the words are even the same or just even slightly correlated and that creates a bias or, you know, a mistake earlier on that is very hard to correct.

Do Soon Kim: Ironically, it comes down to a language problem.

Nicola: Yeah, very ironically indeed! That’s the funny part about it. Kind of an inside joke. But maybe coming back to what we were saying regarding the role of machine learning, both on, let’s say, the therapeutics, but also in understanding the biology and the fact that I think you mentioned before. You said that we are missing a critical mass of data to be able to get there. I think that I’ve seen Xaira actually doing some progress in that and releasing some important data sets in that respect. Do you think that it would be possible at some point to have an understanding that goes beyond the molecule and indeed maybe more cellular levels, therefore more about the biology, maybe a virtual cell or something like that? Or is that just hype?

Do Soon Kim:

No, I think that absolutely should be the goal, and there’s actually a lot of progress towards it. This has been the challenge of biology for a very long time. Physics by definition is like human descriptions of natural phenomena, but the nature of physics as a field that is able to precisely define the scale at which occurs. And chemistry is also a very bottom-up kind of discipline in that humans can build very isolated and controlled reactions to study very specific phenomena.

Biology has like a hard line of like life and not life. And there’s biochemistry. You can do lots of stuff in test tubes and make it look like biology and hopefully it maybe mimics biology. But where this is a gray area is like whether a virus is alive or not. But I’m going to ignore that “is a virus alive or not” argument and just go into like if you’re a researcher working in biology, what is so hard about it?

Even the smallest unit that people would consider to have life, which is a cell, even the simplest cell is enormously complex. The complex synthetic pathway in chemistry might have a hundred steps and maybe 400 reagents. That’s just like, pales in comparison to the complexity of a cell. So, we sort of inherit this like this almost unimaginable amount of complexity by deciding to work in biology because biology is a very complex system that has evolved over billions of years.

A good one-line summary of biology is like “context matters”, right?

Nicola: Absolutely.

Do Soon Kim:

What you described, if we want to be successful in applying machine learning to biology, what I consider success is understanding that context. So X-Orion, the dataset that was made open by Xaira is really focused on the effect of gene perturbations on the rest of the cellular transcriptome.

So, I think absolutely the goal should be to go to the virtual cell, the virtual animal, the virtual organ. And it really should be to contextualize the network effects of biology, because that is kind of like what biology is.

Nicola: I totally believe in that, but at the same time, I think I’m old enough to have seen, when was that, around 2010 or something like that, there was a big hype around system biology, let’s call it like that, in which people were actually trying to do these sort of things, right? So, people are given tools, whatever tool you had, something either running the computer or a set of ODEs. But eventually the idea was, let’s try to describe all these pathways, let’s try to describe biology, maybe a cell. And in a way that falls short a bit of expectations back then for good reasons, I assume.

What do you think is the differences that might be, you know, now make us more optimistic?

Do Soon Kim:

I think the biggest differences since then boil down to the key technologies that have changed the volume of data that you can get and then the amount of available compute. I think you absolutely have to have both of those things. So, if I compare back to like systems biology, there is a point at which we used to say a cell is a… and this is like the human is trying to make sense of something very complex, I think. We say, the cell is just a collection of chemical processes. We know how to model a single kinetic chemical process through kinetic equations. We know how to solve those using ODEs. So now we’re going to model a cell using a system of ODEs. And then, it comes out to a matrix operation.

Turns out that like that’s even actually too reductionist. And if you are missing some critical ODEs, the model can sort of fall apart. Right?

Nicola: In a way I get the context, right?

Do Soon Kim:

Yeah, and also, it turns out that to solve really large matrices, you need a lot of compute. Yeah, so those two themes come out. So, what’s changed now? Well, between then and now, the sequencing, the amount of sequencing we can do has improved quite a bit. The amount of, you know, other kinds of insights into the cell have improved quite a bit, proteomics, single cell genomics, spatial genomics.

So, there’s been all these improvements in technology. Meanwhile, compute has gotten exponentially powerful as well. So, the number of, the size of matrix we can even imagine manipulating inside a computer chip has just grown exponentially as well as the data.

So I think we’ve already tried this. There was a lot of hype around it. What do we get out of it?

I love history and I actually think about… I think the hype cycles are good. Because the hype cycles make us invest more in it, and even if it doesn’t fulfill the very lofty promise, if you look at the technologies that come out of that investment in the hype cycle, they were clearly a great stepping stone towards where we are today. If we had never invested in systems biology in the 2010s, even though systems biology maybe didn’t fulfill the lofty pitches by the PIs and when they were writing proposals and stuff, it trained a generation of scientists who now understand the limitations and understand what we can do now. I view it as an overall societal positive.

Nicola: Yeah.

Do Soon Kim:

And it’s just cyclical, I think. I think something similar happened to machine learning where there’s a lot of hype and people are going to invest in it.

Current architectures and models are not going to ever fulfill this lofty goal we have set for ourselves. But what we’ll get is a great group of humans who have been trained in this world, who will make the next breakthrough and then eventually, if you take the long-term approach, there’s never been a better time to be alive than now. Like humans have just shown the ability to like improve life in the long run over time. So, I’m just always an optimist when I take the long approach.

Nicola: Absolutely. And I like that approach as well. And I do believe the same that, you know, indeed, even if system biology did not really deliver all the promises that, you know, were made back then, in a way we learn from errors, right? So, and that was important to make some error there to now be better at what we’re doing. And plus, you have a set of scientists that were trained from that. And now they’re able to push forward, which I totally believe it.

Maybe to hot takes, because you mentioned that hype can be good. What do you think that today is a bit too much hyped and what you believe is a bit too much underrated?

Do Soon Kim:

Yeah, I mean, maybe this is spitting in my own face, but I think there’s a lot of hype in machine learning. I think there’s too much hype in machine learning. Make no mistake, machine learning is a transformative technology. There’s two ways to frame it. One is we’re going to apply AI to everything. It’s going to change it all. A decade from now, I think that risks sounding a lot like “I have this really big hammer, even though I think it is a really good hammer. And it’s probably one of the most generalist hammers I’ve ever met in my life, and probably ever will”. I think 10-15 years from now, does that sound a lot like “I have a really big hammer and I’m just going to go find every single nail I can. Is it the best thing to apply to it? I don’t care. I have a big hammer”.

I think that’s the danger of the hype. And I think we’d be better off. I don’t disagree with the amount of capital and talent being invested in AI or in machine learning at all. But I think we should be much more problem driven to say “what are the biggest problems that we’re facing today that we think are best addressed by machine learning?” I’d rather us pull resources towards those very important problems. Therapeutics is the one that I feel passionate about. I think climate is another one that’s very important. I think human access to information and civil rights is important.

So, I’d rather us as a field, if I can be considered part of the field, to be much less like “we have a hammer, we’re going to find every single nail”, and be really about the focus on the nails and treat machine learning as a transformative tool to address that problem. Because what I worry about is if the bubble bursts, people will lose sight of how much progress we made in that short amount of time because of benefiting some sort of hype. I think, yeah I try to be an optimist about this very greatly.

Nicola: So, we might throw away the nails just because at some point the hammer was not fit for that nail. But the nail must still be, actually was a useful one to have, maybe just we did not use the right hammer for it because we were just all looking for the to use the hammer that we built. Something like that.

Do Soon Kim:

Yeah, exactly. Agriculture is a very important problem that we need to solve because nutrition is a fundamental human need, right? And over time, you see lots of different technologies being applied to agriculture. The importance of agriculture will never fade. The importance of human health, hopefully, will never fade. And animal health will never fade. The importance of like, like civil rights for people will also never fade. I’d like for society to stay focused on the problems instead of just talking so much about the tool because I think if you if you talk too much just about the tool: one, there’s people who come and talk about the tool because it gives them attention, they actually have no idea, and it’s just like, spinning a bunch of stuff, and I don’t love that. And the second is, we lose sight of what’s actually important. And we start measuring ourselves on how good the tool is at some obscure thing, but it’s not valuable. I want us to focus on societal problems that I think are going to really move the needle for the next generation.

Nicola: And I think that makes a lot of sense. And I think indeed, therapeutics and biology is probably one of these areas which is worth, you know, investing time and energy in.

Maybe I just wanted to browse with you a concept, you know, coming back indeed to biology and understanding biology and just wanted to hear your opinion about that. Because I think you mentioned before also, you mentioned physics, you mentioned chemistry and how those, in a way I call them.. there is compositionality into it. You know, you can take a formula, put it in another one and the system works and you can put, you know, a reaction, put in another one and the system works kind of, right?

I think, however, that compositionality is missing in biology. And partly it’s probably because indeed of the context, we are missing too much of the context to actually understand the compositionality part of it. Maybe also due to a lack of abstraction, a correct abstraction. For instance, I remember that I was working on something like downregulation, no gene deregulation. Deregulation of proteins can happen in multiple ways. And you know, it can happen via, you know, proteolysis can happen in different processes.

And to me, all of that, I have this sense that a lot of this can actually be abstracted, and you can think, okay, at some point my process need downregulation and that can be implemented with a set of different, you know, possible implementation. And if we would think about biology a bit more in that abstract sense, a bit like, you know, in computers when you go from high level languages, then down eventually to the lower level languages and eventually to the hardware and to the machine, I could imagine that, you know, at some point, we could have a system in which we can think about biology as a compositionable system, so to speak, in which we can implement different processes with different types of machinery, different types of molecules, and eventually achieve functions. So, what’s your take on that? Do you think that that’s something, and something that maybe machine learning can help us?

Do Soon Kim

Yeah, so I mean, this goes back to the first thing I talked about today, which was I hope and I think machine learning will reshape what we think of intelligence. You use the word abstraction, which I think is the right word to use. When you describe composability in chemistry and physics and how the formulas like don’t conflict with each other, right? The formulas themselves are just human descriptions of natural phenomena. The laws that are governing physics and the laws that are governing chemistry exist whether humans understand it or not, right? Biology must work well. Things inside of a cell must be self-consistent because they are.

I think the first step is to sort of accept like, biology will continue to be bio-, the laws that govern biology will be there whether we understand it or not and whether we can decompose them into a set of human-interpretable laws and systems.

And I think that’s uniquely why machine learning is actually very good for this. It’s because you can sort of decouple the need to make something human interpretable into an objective function. Take the classical model of drug discovery, which is you identify a pathway that you think when perturbed results in disease, humans do lots of elegant experiments on that pathway to see if a functional change can be made and if that results in a disease phenotype.

Already, there’s three steps of human interpretability in there that it has to satisfy before we go after that target as a drug target. And then you start to discover a drug against it. Again, you have to make like- humans at some point have to like understand why that drug works or like how well it works. But it relies so much on human interpretation or the ability of humans to conceptualize it.

For therapeutics, what if instead all we care about is actually that the therapy in whatever form gets us back to the state we want to be, can we imagine a world in which somewhere in the latent space of a model that is captured, but we can accept not being able to understand it in a human intuitive way, but we accept the outcome.

I think this is like the next level of intelligence that we might have to get comfortable with machine learning is because the outputs of models can sometimes just not make sense, but we should really measure it on the results.

And people have no problem with this when, like AlphaGo, for example. When it played Lee Sedol, there’s this very famous move, like move 37, where they were like, wow, a human would have never done that, right? And then turns out that it ended up winning that match.

We have no problem accepting it and like the lack of interpretability around it when it’s happening in a game. But when it comes to biology or therapeutics or science in general, I think we’re very much more careful about it. But I think in the future, yes, I think we’ll have to develop that level of like, we’ll have to sort of relinquish control and be comfortable with machines having developed what we think of as understanding. As long as we’re very precise about how we are measured and what manipulation, like what goal we’re springing towards.

Nicola: So, in a way, we probably at some point will need to accept the humility of not being able to grasp and understand how that all process works, but just recognize that the output, the outcome is what we want it for. So maybe the humility to not fully appreciate how the machines were able to conceive all of that, but be cognizant that still the output matches our expectations, and that should be good enough.

Do Soon Kim

Yeah, I think so. I mean like, do we understand why it is that dogs can smell drugs or track people? No, we don’t. But we certainly take advantage of it. I think there are many examples of this where like there are many examples of like what I would consider superhuman ability that we are more than- like humanity is more than happy to take advantage of. But I think with machine learning, it’s about the fact that humans brought it along that people have a lot of pause about. And I think they’re right to be very cognizant of responsible use of it. But I think there’s a cultural acceptance piece that we should get comfortable with because I think it is coming.

Nicola: Right, and that I think is very interesting and a very cool discussion. I think we’re heading towards the end of the episode. So usually at the end, I always like to ask all the guests the contrarian question. And that is, what you do you believe to be true currently in the industry, but most of your peers would not agree with you about?

Do Soon Kim:

When I think about something that is true, I think about that as true in the natural world. So, in that sense, things that are actually true, I could care less if I have a view on it or other people have a view on it. I can answer a version of this, which is like, what do I disagree with most people on? And the other thing is like, who do I consider my peers? And that’s also like, really hard for me.

So, what is the contrarian take I have, which I think most people in the field of machine learning and biology for therapeutics might not agree with or might not be a popular take? I think the fundamental one is, I think there is a belief that the same things that worked for large language models will also work for models that are meant to be used for interpreting the physical world.

So, if I see the trends in the kind of research that is being done, in models that are supposed to design therapeutics, they mirror a lot of what I saw in the LLM world — large language model world — probably 5-7 years ago. I’m not totally convinced that that is the right approach because the machine learning, the large language model world sort of followed a few research breakthroughs on actually not that much compute, followed by rapid scaling to a point where you hear about different frontier companies committing tens of billions of dollars of compute towards getting the next breakthrough. They entered this scaling phase and then reinforcement learning became very popular again.

I think if I, if I think specifically therapeutics, there’s a lot of belief that scale is the answer. We’ll take advantage of the scaling laws like it did, like the LLMs did, and we’ll get there.

I think people are failing to appreciate the nuances of biology in the physical world when they say that. So, I actually think the thing that gets us to what we discussed is like either a machine or a human understanding of biology and better therapies, it’s actually not limited by what we, what is like scaling, but it’s actually limited by some fundamental research breakthrough that has not yet happened.

So, I think if I look at the investment that is happening in the life sciences field, so much of it is just going towards scaling. We just believe like it’s just kind of scale, scale, scale. I actually think investment in fundamental research is where it’s going to actually make the greatest short-term take. And I think the people I consider my peers might agree with me because I like researchers, but the investment and the trends in the industry certainly are all for scaling because they believe there’s something to be unlocked there.

Nicola: No, indeed. What you’re saying is that we need to achieve those breakthroughs in the fundamental research to really unlock the next level there. And that’s what we need to face in the coming years.

Do Soon Kim:

I think so. Like I think people are already forgetting what a breakthrough AlphaFold-2 was. The PDB didn’t change that much between Alpha Fold 1 and 2. It wasn’t like the underlying data changed by a lot. It is a bigger model so scale has a lot to do with it. But really it was like a lot of human ingenuity and fundamental research being done. And I think… We have forgotten what that great leap was based on.

Nicola: Do Soon, thank you very much for joining us today at the show. I enjoyed the chat. Hopefully we’ll talk again and have a nice day.

Do Soon Kim: Awesome. Thanks so much.

Never miss an episode. Subscribe for monthly insights from the biologics R&D leaders shaping drug discovery.

Top