In this episode
Jiwon Kim is a Product Lead at AWS AI Life Sciences. An engineer and scientist by training, he brings over a decade of experience across hardware, software, and machine learning to build and deliver AI solutions for the life sciences industry.
In this episode of Models & Molecules, Jiwon discusses the digital transformation happening inside pharma organizations: cloud is becoming essential infrastructure, driven by the need for better data management and tighter machine learning integration. At the same time, software is shifting from a support function to a core component of the drug discovery process. Jiwon explains why the field is heading toward zero-shot models, and why lab-in-the-loop systems are the necessary bridge until those models are robust enough to propose candidates with minimal context. The conversation also explores the rise of agentic AI, helping scientists navigate complex data to support target identification and hypothesis generation.
Finally, Jiwon argues that broader impact of AI/ML depends on democratizing access to these tools beyond only the largest pharma companies, while acknowledging a key practical bottleneck: generating high-quality wet lab data fast enough to continually improve ML models.
Key takeaways
- Cloud and software are becoming the backbone of pharma infrastructure, and a core strategy across life sciences. The shift is accelerated by growing data management and ML integration needs.
- Even with plenty of GPUs, progress hinges on unlocking overlooked negative data and solving the bottleneck of generating high‑quality wet‑lab data fast enough to feed back into ML models.
- Zero-shot models will revolutionize drug discovery, but until we have full confidence in their performance, lab-in-the-loop is the necessary intermediate path.
- Agentic AI can help shift drug discovery to an AI-first approach by making powerful tools accessible and usable for all scientists, not just specialists.
- Democratizing software and AI tools will empower more researchers and ultimately lead to better patient outcomes.
Full interview transcript
Nicola: Hey, Jiwon. Welcome to the show. It’s a pleasure to have you here.
Jiwon Kim: Thanks for having me. Excited to be here.
Nicola: I remember when I started ENPICOM around 2014-2015, that it was not a given for a pharma organization to work with cloud, sometimes even with so much software. And I think much changed. And now, it’s much more comfortable. It’s almost obvious for these organizations to work with cloud and their attitude towards cloud and software changed a lot. So can you tell us a bit, what is your experience?
Jiwon Kim: Yes, here at AWS, we have a strong presence of pharma customers. As you may know, 19 out of 20 global pharma customers are built on AWS. So, I was able to get some first and secondhand exposures while I’ve been here for the last three and a half years to interact with some of those customers in this specific segment. So, as you pointed out, there was a lot of skepticism around 2014-2015 due to regulatory anxiety and legacy infrastructures. But the global life science cloud market is now at about $26 billion, which is about 5 times increase from 2015 baseline with more than 83% adoption rate. So, this market is expected to continue to grow and projected to hit about $105 billion by 2034 based on some of the most recent market research report.
As you can imagine, the biggest driver is obviously machine learning integration, as 68% of the pharma companies have stated that AI as a technology with the highest potential impact. So as you see from the recent news that have been announced for the last 2 to 3 months, you’re hearing all these big deals between AI model providers and global pharma companies, striking a multi-year deal to train the model with proprietary data that pharma companies own and to be using the actual drug discovery pipeline.
So, for example, Chai with Eli Lilly or Boltz with Pfizer. So those are some of the examples that are showing how pharma is quickly shifting to adapt AI as a core strategy.
And as a result of that, we are hearing a lot from the pharma customers to figure out how to digitize their infrastructures and also their data itself. So, some of the anecdotes we’ve been hearing is we are sitting on all this data, but they’re not in the format or the data type that can actually be used for training purposes, but these are some of the notes that were written decades ago or in the PDV format, et cetera. So, they have all these questions and needs that can be served by some of the cloud to build the right infrastructures in a way to get to a concept like Connected Lab, meaning they are going to be looking to more integrate some of their LIMS system into cloud environment, meaning directly integrating and connecting all this lab equipment. So, whenever the data is generated, they are transformed in a format and be translated into training the model or even getting used for a quick analysis so that overall workflow or overall process can be more efficient and secure.
So overall, the cloud and AR are becoming a core strategy, not just for the pharma customers, but the entire life science segment is heading toward that way, based on my observations.
Nicola: I think this is interesting and you touch upon a point that I think deserves a bit of a reflection because you mentioned that indeed they’re looking for a solution to manage all the data that they’re sitting on. And of course, also to the data that they will start generating. And there’s been a discussion regarding legacy data and data that will be produced in the future. And some people believe that legacy data maybe is not so reusable for AI/ML models because it’s lacking certain key elements like homogeneity or proper metadata of it. In your experience, have you seen a trend of people trying, or companies actually, trying to leverage more legacy data or are they more preparing towards the new generation of data sets?
Jiwon Kim: So, I think there’s interest in both. First of all, now because of evolving large language models and all these other evolving technologies, not just the bio foundation models or models to predict the sequence or structures of protein, for example.
I think it’s very important to provide the right context to this LLM. And I think there’s a certain set of structures of data or they’re seeing the benefit of providing the right context to these LLMs or other AI tools to really drive the right insights, scientific insights that can be used for any other future iteration of the drug pipeline or to modify the targets. So, I think you’re seeing some of these examples coming out, for example like Google Co-Scientist, one example, there’s Edison Scientific. That are showing if you really, you know, feed in the right context of information and by connecting to the right data sources, these language models can actually read and understand the context across the entire portfolio of the disease that they were studying or researchers were studying and be able to really give you a recommendation on like: “oh these are some of the hypothesis that you should be considering when you’re first starting your drug discovery pipeline”, or maybe “here are some of the targets that we think we should be looking at based on the information that you gave me as a part of this”, and also, “I have an ability to do research on my own”. So, you will go and search the literatures and come back with a solid reasoning on why these are some of the targets you should be looking into.
So, I think that’s why the pharma companies seem like, “oh, there’s all this benefit”, because I’m sure there are a ton of data that they’re sitting on that has not been published. And I think one thing that is also important a change in trend is there’s more recognition of the importance of negative data. Because I think all the publications, for example, they’re talking about the positive data, all the great outcomes and breakthroughs, but the real challenges when it comes to training or giving a context to these AIs is very important for them to also understand what led to failure, to really give the right context or really give the right hypothesis to tackle. So that’s what they’re sitting on that have not been investigated further to how to leverage them or have been overlooked until this point.
But they’re realizing the value of those data points that they’re sitting on and figuring how to digitize them in a way that can be used in the right way. So, I think it’s definitely both. I think for the future, they’re interested in how to modernize or digitize their infrastructures so they can continue to have a well-oiled machine that can continue to optimize and improve the process. In parallel, they’re looking to like “how do I monetize” or “how do I maximize the use of the data that I’m currently sitting on and how do I really incorporate into all the future work that’s going to happen down the road”.
Nicola: Right, no, indeed. And I think we touched a bit upon the data and I think we will talk more about that and what that means for agentic models and all of that. But, that is one component. What’s your take about software? Do you think that also the way in which pharma is looking at software changes and maybe it’s not anymore just a support, but it becomes core of certain processes like, you know, discovery processes? What’s your take?
Jiwon Kim: So, the question of whether pharma companies can still treat software as a support function can be answered by giving some of the examples of profound transformation that has happened over the last decade. The emergence of TechBio companies like Recursion, Insilico Medicine, or Insitro are clear examples of how the software had transitioned to become a discovery engine itself. So, these companies are founded on the computational-first principles where software platforms generate drug candidates rather than just simply assisting human researchers. So, going back to Recursion, for example, because they were founded in 2013 with a really radical hypothesis at the time that it can actually train artificial intelligence on cellular images at a massive scale to understand disease biology, then use those insights to identify therapeutic candidates. So, they’ve been calling this as a Recursion operating system, in short, like Recursion OS function as an end-to-end platform spanning target identification to clinical trial enrollment. So, this isn’t an example of just showing the software working as a supporting discovery, but it’s actually software performing discovery. So, there’s a fundamental shift in the role of the software in the drug discovery itself. One of the other example that I can think of right now is Genentech’s Lab-in-the-Loop that was published in partnership with Stanford.
So as a lot of you know, Genentech has been one of the traditional pharmaceutical powerhouses from Roche, and they came up with the concept called Lab-in-the-Loop initiative. And this approach is essentially creating a continuous learning cycle. So, the AI models generate prediction about drug candidates and lab experiments build predictions and results are refined, refining the model continuously and iteratively. So, through the cycle of loops of learning, you can actually get to the ultimate outcome of giving you the candidates that are clinically relevant. So, I think that they have shown that over 4 cycles of these loops, and they got to about 1800 antibody variants that are for clinically relevant targets like EGFR, HER2, and some others producing up to 100 times better binding variants and leading candidates in the therapeutically relevant, in about 100 picomolar ranges.
They also mentioned the anecdotes from their researchers, like these are candidates I would have never picked if I were to just look at the traditional method, like looking at the sequences, looking at the structures.
And here is the software outperforming some of the experts who’ve been in the field for decades, more than decades. I’m not trying to diminish the expertise that human researchers are bringing on the table because we still need a human in the loop, meaning there could be another potential hallucination happening or AI maybe predicting something that doesn’t even make sense from a physics or from biology standpoint.
So, we still need a human expert in the loop, just checking whether these are really heading in the right direction. But I think that Genentech’s proposal on lab-in-the-loop idea is essentially that you can actually develop a platform or software that can actually function as a discovery engine without a lot of oversight from the humans.
Pharmas are also feeling that way. That’s why I’m seeing a lot of, some of the pharma companies are now seeing they are developing their own platform or even in partnership with some of these well-known model providers to see how it can really adapt this new norm of software and platform acting as a drug discovery engine versus before it was only received as like they’re just one of the tools. I mean, it still doesn’t take the fact that AI is still one of the tools to be leveraged in the research and discovery, but software is becoming a more predominant force in the field of discovery to lead the discovery and with the help of human researchers in the loop providing their feedback versus it was completely the other way, vice versa, because it was human researchers leading the way and using some of these tools available to make discovery. So, I think this shift in idea of software becoming a more lead role in the space of discovery is definitely shifting and definitely clearly observed across our field.
Nicola: I think we are definitely observing the same and indeed, you’re mentioning the lab-in-the-loop, which is I think one of the core ideas around this process. Maybe, do you know or you have an idea what are the most common pitfalls that perhaps companies end up in trying to implement this lab-in-the-loop or is something that maybe you’ve experienced?
Jiwon Kim: I don’t think I can say that I have experienced firsthand in terms of how this lab-in-the-loop is being used across the field. But what I’m noticing is, based on a lot of the literatures and publications that are coming out, I think this concept of lab-in-the-loop versus zero-shot, I think these are two completely different phenomena that are happening at the same time.
I truly believe that zero-shot models are the future and the way to go down the road. So, we don’t even need to have a lot of these researchers or a lot of validation in wet lab to happen before any candidates to be taken to the next step, either in the pre-clinical trials or even in vivo studies.
But until the zero-shot models are really there to give us confidence, like just with the no context, you just do zero-shot, you get the candidates, I think we need an intermediate solution, which is the lab-in-the-loop, because the models may not be perfect at this point, but it can learn over time. It’s very similar to how humans are learning too. For example, I have a six-year-old son who’s just starting to learn how to play soccer. He’s not going to be the most amazing player in the beginning. He’s not going to be playing like Messi, for example. But over time, he’s going to learn from mistakes and he’s going to learn how to play with our teammates and he’s going to learn new skills and over time he becomes a better player. So, I think until the models are so beyond the human intelligence to be able to say like, “okay, we don’t even need to have any validation, we don’t even need any human in the loop”, I think the lab-in-the-loop has to be the intermediate step where you have all these opensource models or models that you can get yourself access to and try. And by fitting in the actual ground source data, which is the wet lab data that are generated to tell a model like “these are truly the right predictions and these are wrong” and it actually learns and self-optimizes and you see improvement of coefficient of determination between the prediction and actual data or ground source data from wet lab.
You will continue to see the improvement. I think Genentech paper or examples show that after four cycles of that, they finally got to the therapeutically relevant candidates into picomolar ranges of binders.
I do have a lot of hopes and I do think that is the future. That, you know, some of the Isomorphic or Chai or some of the other zero-shot model providers, they’re going to get there and it’s going to really truly transform how this research is done without any validation. But until that point, I think we need some intermediate solution like lab-in-the-loop.
And I think this is where the software can play a huge role because you need some sort of software to be able to host the model, run the model, and take input from the wet lab and to be able to feed that into the system, to be able to learn from it and model to be fine-tuned, and then repeat another set of experiments based on optimized or modified configuration or different model ways, et cetera.
So, I don’t think I have seen an actual product that has been launched in the market yet in terms of saying, like, “we are the lab-in-the-loop platform” per se. But I’m sure that could be it. I’m just speaking based on what I know as of today. But I think there is going to be a lot more effort to create this phenomenon of lab-in-the-loop until we really have full confidence on the zero-shot models to be leveraged.
Nicola: I totally relate to what you’re saying. And indeed, often the zero-shot is still met with quite some, you know, skepticism in a way, and maybe also for good reasons as well. So, I do indeed believe that probably the step in between is the lab-in-the-loop and software is going to play a big role into that. Also, because I think there is a whole change of mindset that needs to be seen on how these companies operate. Because if software also becomes a more central part of how discovery is done in a way, also the type of skills needed by the people that operate those types of cycles need to change. And maybe there is also a role for software there to make sure that it’s adapted to the people that are going to operate, which not necessarily are the tech people that operate in other industry. Those are people that work in the lab that have a certain mindset, certain background skills, which are very relevant. But therefore there needs to be probably a bridge between the software and these types of roles and skill sets I assume. Is it something that you’ve been working on in your role as a product manager? Can you tell a bit about what you do in terms of product manager AWS?
Jiwon Kim: Yeah, I think that’s a great question. So, I think that is something that we are thinking a lot about. As you probably know, there’s AI agents that are being discussed as one of the solutions. And as you pointed out, software has been adapted or perceived as an expert area that can only be used and be trained by some of the experts in the software development area, if you will.
But I think to your point, you know, for this software to be the mainstream and main way of doing this drug discovery, there’s a clear need for all the scientists to be able to quickly, easily use and adapt as one of the mainstream mechanism to get to their discovery. So, there is an educational component and there’s a translational component that we have to consider here is, if you look at some of the interviews that I have done by our Senior Vice President, Colleen Aubrey, is there’s a concept of AI teammates. Very similar to the concept of the AI agents that are performing some of the tasks on behalf of you is what we know as, I think, a lot of the example that you see on the news are AI agents doing shopping and booking flights and hotels for you.
But I think there are more use cases that we can consider beyond that because if you’re a researcher who has never touched any of these in silico methods or any of this AI models before, you cannot really expect a person to come into software and adapt lab-in-the-loop. I think it has to be super intuitive. And one of the methods that we are thinking hard about is, how can we leverage AI agents to help you get to that point? Imagine you are just starting out and there is a configuration agent just figuring out what kind of models to put together to meet your research objective. And then maybe there’s another agent interpreting the results and giving you a baseline of which candidates to consider to take your next step. And maybe the researcher will only have to come and look at the intermediate results and say I like these candidates or I don’t like the approach you’re taking.
More like a simple dialogue like you and I having and maybe a human researcher can have such conversation with multiple agents and ultimately guide them to get to the ultimate outcome. So, for example, if you’re human researchers, maybe it takes 8 to 12 weeks to finish this initial cycle based on being trained on a software. And imagine if you have agents or teammates working on behalf of you that have already iterated 20 cycles of experiments on behalf of you while you’re doing something more important and thinking through some of the more intellectually required, intellectually heavy exercises. Then you come back and see, based on 20 plus iterations, this is what the ultimate outcome is. And to give you either confidence on whether you’re ready to move forward to the next stage, which could be pre-clinical or in-vivo studies, or you can tell in natural language, actually, “these are some of the stuff that I like to see done differently or better”, then agents will go back and perform those tasks and come back. So, I think the ultimate shift is how these researchers are interacting with AI models. And also, agents can be the intermediate layer, educating you on how to use this AI in your research. But at the same time, as a teammate, they can actually help you to get to your successful outcome quicker and easier and more efficiently because these agents are already doing some of this task on behalf of you while you’re doing something else.
I think it will take a lot of time if you just rely on the current paradigm of software with some of the user guide or FAQs or how-to videos. It’s going to be really challenging for these researchers to read all that and come back and feel like they’re fully trained on software to be able to adapt it as a day-to-day operation or a way of doing the research. I think we can quickly accelerate that paradigm shift by bringing AI agents and be able to really interact with the human in the loop and to be able to get the ultimate outcome quicker. So, I think that having an agent take layers is super important for us to really get to the ultimate outcome of pharma companies or these tech bios or even any companies in life sciences to be able to really shift their paradigm to AI-first approach. And I think that’s where I see a lot of the vision and a lot of the future of leveraging agentic experience to achieve that outcome.
Nicola: If I get correctly, what you’re saying here is that software needs to change too. And you see a big role of agentic AI, let’s say, applied in pharma companies and perhaps in different stages. And maybe that’s something that I would like to challenge you a bit. Where do you see at the different stages, the best use today of agentic AI, because for instance, we can talk about, for instance, target discovery, in which I can imagine you can leverage agentic AI to support your research, trying to identify potential targets, collect all kinds of information from literature. But there is perhaps also other applications of agentic AI also in the discovery process when they can inform you on maybe what type of experiments you would like to perform or what follow-up maybe you could perform. Do you think that one or the other, or maybe others are the most, let’s say useful at this stage or do you think there is a difference? Do you think it can be applied across the board? Where do you see the most concrete application of agentic AI today in pharma organizations?
Jiwon Kim: Yeah, I think that’s great question. I don’t think this question only applies to pharma companies, but I think across the life science sectors. I do see a benefit of agentic experience being applied across the entire drug discovery. So, as you pointed out, target identification is another big field because typically, there’re scientists that are dedicated studying that disease mechanism or disease biology for several years before being able to understand which target to even consider for next set of phases. And that requires a lot of understanding of the data or generating the data, et cetera. So, I think one of the early examples that we are seeing on the specific target identification aspect is Google co-scientist being one of the examples. They actually have used their Gemini large language models. And with all the right context that have been provided, they came up with five hypotheses. And then researchers actually went and actually evaluated their hypotheses and see actually what are those outcomes and whether they have actually therapy relevance.
And then surprisingly, they even came up with the new hypotheses that were never really investigated in the field before that had actually promising results to show, to be able to really take into the next set of experimentations. So, I think first starting with the target identification within the drug discovery pipeline, that’s one big opportunity that people are pursuing. And not just Google, but Anthropic, for example, they announced like Claude for Life Sciences. Similar idea, they interconnect with Benchling, for example, because they’re the ELN, Electronic Lab Notebook, where a lot of researchers are hosting their experimental data. So, giving the actual context on what the previous experiments data show to Claude for Life Sciences, and then they can actually come up with guidance on what are the next considerations to make on the next set of experiments to run, et cetera.
So, I think on the target discovery side, this is where I see a lot of promises where agents are coming in, doing research for you, understanding the context of the previous results or experiments, and then come up with the really novel hypothesis that humans have never considered before. And I think they also translate into other part of the drug discovery, right? There’s a heat to lead optimization and process within drug discovery, it’s just not that you only have a target to start with, then you also have to go through multiple iterations of cycles to understand whether those targets are indeed the right target. Because if you look at some of these previous data points, I think more than 50% of the failures occurred in the clinical trial stages because of having a wrong target. In order for us to really say that AI is really changing the paradigm of drug discovery or entire drug development process, we have to make sure that there’s a right target to start with and need to have a really high confidence. Otherwise, we will lose all this time and effort and energy to get to that clinical trial and then we are back to square zero because we have to figure out what the right target is.
I think that’s where I see a lot of opportunities in. As I mentioned, even though you have the right target, assuming, then you still have to go through the hit to lead and lead optimization process, which is where a lot of these models are currently focusing on. For example, AlphaFold, Boltz, or Chai, some of these design or scoring functions, these are the models that are out there to help you to understand the sequence structures and how the mutations are either improving the properties or making it worse. So, there are all those AI/machine learning tools out there that help you to accelerate the hit to lead or lead optimization process to get to the ultimate outcome, which is to get to in-vivo or pre-clinical stages.
And agents can play along the way because you will have an ability to pick the right model or you will have an ability to optimize across multiple properties, not just binding, for example, but you need to look at the liability, developability, humanness to ultimately be able to find the right candidate of antibodies, for example. Then, agents can come in and actually run the right model, execute, and ultimately get the results back to the user and user to be able to interact and understand what the right candidates are to take the next set of stages with the discovery will be super helpful.
Nicola: In a way, indeed, we can see that agentic system as a sort of assistant to the process. Do you actually think that we will reach a point in which rather than be the assistant, there will be let’s say, the boss and the human in the loop would just be the checker that says, okay, I think I trust you on this, you do and you do most of the process autonomously. You think that this is a very reachable point or is something very far in the future?
Jiwon Kim: I love this question because that’s what I envision and that’s what I dream about.
So, there’s all this concept of vibe coding out there in the software right now. You just use your natural language, and you should be able to build an app in a matter of minutes or even create a new game based on the new storyline or new ideas. So, I think it has really democratized the capability to build a new software, new applications to everyone essentially who wants to or who has an idea to do so. So, I think it’s going to be a very similar future that Envision is. Imagine you can just comment, vibe code a protein by stating like “here are my interests at a disease and I want to approach it this way X, Y and Z”, explain it to your agents or cohort of agents. And ultimately they’ll come back and, you know, tell you like “here’s how I’m approaching this problem and here are hypotheses I validated through X, Y, Z approaches, and here are the ultimate outcome”. And then, humans will be able to say “okay, I like the idea” or not.
I think ultimately like we’ll get to a point where we can actually democratize the access of these tools to all researchers, really getting the benefit to the patients ultimately down the road, because you don’t need to have millions of dollars to use the proprietary model that are currently confined within certain companies. You don’t need to have a lot of money to work with Isomorphic or Chai and others. You don’t need to strike a multi-year deal at this point. I’m sure the cost will come down ultimately to be able to have more broader access to those proprietary models. I have no doubt about that. But I’m just speaking in the current state of art that mostly big pharmas are having access or being able to work with some of those companies to get the ultimate outcome.
But I think in order for us to really truly democratize or give all this power of evolving AI/machine learning tool in the hands of all scientists, and that’s when I think it will make a lot of difference in society and to the people who are suffering at the hospital. So, just sharing some personal story in my mind, I’ve been diagnosed with a cancer two years ago. So, I went through two surgeries and one radioactive treatment. And while I was at the hospital, I saw too many people suffering. And that’s when I decided “this is something I’d like to dedicate myself to”, and that’s why I joined this organization because I wanted to work in the intersection of biology and AI, leveraging some of my educational and professional background, if you will.
So, I ultimately want to see the transformation in how these drug discoveries are done. Similar to how the software development was done before vibe coding, it was very confined and it was almost privileged, you know, expertise that were confined with the software development engineers and to build an app or build a software, et cetera.
But I think we are now at a point to potentially make a difference in the drug discovery field to give this powerful tool to all scientists, whether either you’re an academic researcher or either you work in a research institution, but you may not have a lot of budget, but you should be able to access some of these tools to be able to get to your ultimate outcome. That’s what I envision, and that’s what the future will entail when it comes to drug discovery to ultimately make a huge difference to the patients down the road.
Nicola: Oh Jiwon, that’s a very inspiring story. Starting this journey under that motivation, I think. And I also share very much the concept of democratization. I think this is one of the biggest opportunities of AI in general, but also definitely in the life sciences, it’s to give power to the powerless, in a way. So, basically enhancing the capability of people, in a way. And if we can democratize AI, meaning giving access to the tools to the most people as possible that also we further amplify the effect of AI in a way. And I think this is, I understand a bit also your mission of what you’re doing at AWS, right? So, in that respect, maybe also for the people that would like to pursue a journey similar to yours, can you tell something about, know, what are your lessons learned in your job and, you know, how you got there and what can you share from a mentor’s perspective?
Jiwon Kim: I’m not sure if I can call myself a mentor because I’m still learning every day and making mistakes. And similar to the lab-in-the-loop, I’m also a human system, if you will, just learning from my mistakes and trying to get better every day. But I think one thing maybe I want to say is I never really had a clear North Star, and I know a lot of people come out and say “you need to have a North Star for your career” or “you need to have ambition to get to where you need to be by X date” or I think some people just plan out their next 10 years to make sure they’re on track to get to that point. But I was never that person, I think. As you can see from my track record, I was a PhD student at the University of Michigan, you know, studying how to create artificial organs using human tissue and biomaterials.
And then I went to work for Procter & Gamble to do large-scale manufacturing for Swiffer and then became a brand manager for Dawn and Gain doing commercial innovation and actually working as a CEO of the Gain brand and managing the entire P&L and growing the brand 50% year over year. And then I came to Amazon and ultimately down to AWS. So, if you would have seen the track record early, if I had went back and planned to be where I’m at right now when I was a grad student at the University of Michigan, I don’t think I could even plan out the exact same way to get to where I’m at. So, I think my progress or my career path has been dictated by what is it that I want to learn next. So, when I went to Procter & Gamble, I was sitting at the plant because I was developing a new way of manufacturing soap for heavy duty.
And I was seeing like these thousands of boxes piling up at the plant and that got me wondering who’s making a decision to what to make to deliver to customer because as a global process engineer, you’re just told what to build. So, I was curious about that and that triggered intellectual curiosity got me to something called brand management. And then when I got there, I started to do a lot of hardware innovations and commercial innovations.
And ultimately, it came to another question: “I wonder what’s going on in the software, because I’ve done hardware innovation, commercial innovation, and I’d like to understand how the innovations are done in software”.
And that’s when I decided to join tech companies and Amazon started there. And I think this intellectual curiosity and trying to learn what is actually driving you to become better or to understand what is it that you want to learn as an expert and grow as a person has been a driving force and decision factor to get to where I’m at. And I think ultimately decision point was I was at AWS EC2, I was building a really successful product at the time, but I decided to leave the team and decided to join this team that I’m working on right now because of that personal calling and personal experience. And having that experience really reshaped the perspective because I want to be able to make a contribution to the society and be able to become a proud dad, obviously, to tell them you can actually live in a better world down the road with less suffering and having more possibilities and outcomes that will help you throughout this life that can be harder sometimes.
Focus on what intellectual curiosities lead you to the next step. And I would not be so focused on like, “oh I need to have a plan for the next 10 years”. I don’t think anyone has that level of control over anything as a human being. So, I would say let your intellectual curiosities lead you to the next way and ultimately you’ll figure out you’re going to end up at the right place.
Nicola: I think that’s an inspiring story, Jiwon. And I think also myself, like probably many, many others unfortunately in the family have been touched, so to speak, by cancer. And I think I totally relate to what you’re saying. The calling of trying to do your part in solving something that touches the life of so many people. I totally relate to that and your particular personal journey, I find it interesting that you went from hardware to software. This brings me to maybe my next question because looking at the future of pharma, right? One of the critical components is definitely probably automation, which touches a bit upon the hardware, right? So, the fact that if we want to generate significantly much more data, probably we need also to start thinking about – and people are already doing that – automating as much as possible certain lab processes. Perhaps with the support of agentic systems, maybe that orchestrate that type of solution. Have you seen anything like that? And what are your thoughts about that in the future of pharma in this respect?
Jiwon Kim: Even though AI/machine learning continues developing, and there’s no way to support the same way to generate the right quality of data at the rate that needs to be. So, for example, if you already generated 100,000 molecule design using your AI machine learning, but it takes four to five weeks or four to five months of waiting to get the data back from the wet lab to be ingested back in the system, ultimately, if you look at it from the bird’s eye view, that doesn’t really accelerate the time as much as you would like it to be.
There needs to be a hand-in-hand development on both the AI/machine learning side and also on how to really quickly develop a method or a mechanism to quickly generate those data sets. So, I think there are some of the players like there are Lila Sciences, for example, that are saying they’re trying to get to scientific super intelligence by building these AI factories and to be able to really generate this data and also customized ways so you can actually feed those data quickly back into the AI/machine learning model and to be able to give you an X prediction or give you an X idea on how to get better over time. And I think in a similar way, I think the Eli Lilly and NVIDIA partnership, I think that’s another one other example that you should be probably reading and learning about because I think that also shows that Lilly is investing, building a massive AI factory if you will too, along with NVIDIA.
So, I think there’s all this investment on like, both on how to build more infrastructures to get to a better model, et cetera, which is an example from the NVIDIA and Lilly partnership. On the other side, there are players like Lila Sciences or Periodic Labs that are saying, it is so important for us to build a capability and infrastructures to be able to generate this data quick enough and at the quality that there needs to be. And those are also going to be super important for us to scale up and to be able to ultimately get to the results that you need to get to.
So I think that’s why some of the players like Insilico Medicine, they’re building a lot of these capabilities across the globe, not just within the United States, but in some countries like China, for example, or in East Asia countries like Korea also have a really good capability to create biologics really quickly. And I think, that’s why I think it’s so important for us to also think about how to invest those infrastructures to be able to come up with a really quick way to generate this wet lab data at the quality that needs to be to be able to feed back into overall AI machine learning that are being developed. So, I think there’s enough data centers and enough GPUs that have been created as a part of this AI phenomenon, if you will. But in order for us to really truly start to see a difference, or if you are going to really have this lab-in-the-loop phenomenon across the whole drug discovery and have to make it as a norm, we have to solve the bottleneck of how to generate the high-quality wet lab data and actually synthesize any custom targets or any complex biologics and to be able to generate that data that quickly and feed it back to the model, ir will be the big hurdle that we have to overcome as a field to be able to ultimately come out with the outcome that we desire, which is a super accelerated drug discovery.
Nicola: And I think you brought up a very interesting point that I wanted to ask you about because as you said, there are big companies like Lilly that are investing massively to build (together with Nvidia and others) their own, let’s call AI factory, data centers and such.
At the same time, of course, they are big users of the public cloud. So, there are a lot of on-premise, so to speak, going on and a lot of public cloud usage at the same time. Where do you think is the balance in between the two? Do you see that one will dominate or do you see that both will coexist?
Jiwon Kim: Yes, that’s a great question. I’m just sharing my perspective as a person who’s been in this field versus representing the AWS organization just to be transparent. There’s a need for both, I think cloud will never go away. I think, as I mentioned in the beginning of this interview, that cloud is becoming a core part of the equation. Either you’re pharma, life sciences… I think we also have Eli Lilly that I gave an example built on AWS, and they are one of our core customers of AWS. So, I think there’s all this separate initiative, and I think pharma will explore what is the best way ultimately, and come down to the decision, like, “okay, we’re going to continue to just stick with the cloud or invest in something else to build something separately dedicated to AI”.
So, I think ultimately there’s limitation on the GPUs in the world, and I think there’s continuing increase in demand and sometimes it’s really hard to keep up the demand with the supply. And that’s why the NVIDIA stock is continuing going up, making all the splashes and how many GPUs they’re selling every year, et cetera. So, I think there’s an urgent need or almost like fear of missing out to be able to really secure enough GPU by themselves if they can rely on all different cloud providers to be able to provide. So, I think that’s why they really probably went ahead and made it seem like, “let’s do this experiment with NVIDIA and see how it goes”. But I think cloud, AWS, GCP, Azure, I think they are continually going to be investing in it, they’re continually building the data centers. So, they’re going to be continuing to support any need that any customer, not just the pharma, but any customers that they have.
So, I think a lot of the conversations that we had today is very much focused on AI/machine learning because this is a series of Models & Molecules. But I think overall, the cloud is playing an essential role. I think ultimately, the cost is continually coming down. So, I think that’s where this democratization will continue to happen because the GPU is going to be more affordable down the road. And they’re going to have more AI/machine learnings to be leveraged. They’re going to have more easy access to agents. I think ultimately all of this plays a role to get to ultimate democratization and the next evolution of any research that’s coming out with a new drug that can actually make a difference to the patient.
Nicola: I think I agree with you and I don’t think cloud is going anywhere in terms of, you know, it’s not disappearing anytime. I think it’s going to be a critical asset in the future as well. And probably indeed I had some reflection myself on why companies like that started their own let’s say AI factory, data factory. And probably as you said, it’s also to guarantee access to critical resources at a time in which those resources are scarce and there is definitely in a coexistence of both solutions for different purposes. So, I totally align on what you’re saying, Jiwon.
So, I think we are heading towards the end of the episode for today. And as usual, I have one last contrarian question to ask you. And the contrarian question for you is: is there anything that you strongly believe being true when it comes to AI and digital transformation in pharma that many of your peers don’t necessarily agree with you with?
Jiwon Kim: I don’t know if I can say there’s a lot of disagreement amongst the peers…
Nicola: Fair enough.
Jiwon Kim: But I think there’s a healthy debate on zero-shot versus lab-in-the-loop. And as I mentioned, I truly believe that there’s a world where we don’t even need a wet lab because these models are becoming so good that whatever the prediction that they hand out is actually true. And I think there are already some examples out there. They’re outsitting or outperforming some of these people called experts. Because some of the anecdotes say: “oh, I would have never considered this as a candidate to take to the next level”. But we are already seeing AI/machine learning is doing that. And I think ultimately the zero-shot will ultimately get us there.
But I think until that point, I truly believe that lab-in-the-loop is the right approach in the meantime to really be able to leverage all these models out there. I mean, if you look at the GitHub, you can easily find like 150 to 200 models just on the protein design, for example. So, there’s all this effort and excitement around the field, like creating a new model, et cetera. So, the question is, how do we translate that energy and effort into something more tangible, which is to be able to have the right context and use the model in the right way or even getting some help from AI agents to be able to ultimately get to the outcome that you want to get to, which is ultimately to make a difference to patients.
I think maybe it could be contribution in a way that I’m advocating for the lab-in-the-loop for those people who are actually working on zero-shot. I’m not saying anything against the zero-shot by any means, but I’m just proposing the lab-in-the-loop as an intermediate solution, or more like a breach to get to zero-shot models. And I’m sure there’s going to be world that there’s going to be easy access to all the amazing zero-shot models for everyone down the road, but in order for us to really truly revolutionize, and I think the aspect of the democratization needs to be in place and how can we actually leverage open source and get to similar outcome with the help of agents or with the help of providing more data points to this open-source model. I think there are definitely different approaches you can do. But ultimately, I want to be able to help some of these researchers to be able to get to the outcome with any resourcing or funding, whatever that they have, whatever limitation they may have, and be able to really get to the ultimate outcome that they desire.
Nicola: And I agree. I think the debate between lab-in-the-loop and zero-shot will continue. But for now, we close this episode and I would like really to thank you for participating in Models & Molecules. It’s been great chatting with you and I hope we’ll be still in touch and we will talk soon.
Jiwon Kim: Sounds good. Thanks for having me.
Nicola: Thank you so much.