In this episode
Steve Comeau is a Senior Scientist working in large pharma. Having built in silico antibody workflows across Dyax, Shire, Vaccinex, and other life sciences organizations, he brings a deep industrial perspective on what it actually takes to move AI-designed binders from promising algorithms to real impact in drug discovery and development.
In this episode of Models & Molecules, Steve makes the case that we are entering a third wave of antibody generation — the in silico wave. He maps out the space across two axes: generative versus predictive models, and single-body problems (e.g. developability, CMC) versus multi-body problems (e.g. binding), and argues that you can't have a successful generative model unless it understands the predictive nature of the system it's trying to model.
The conversation also tackles pharma's persistent data challenges, and why the harder problems are still ahead: T-cell engagers, activatable drugs, protein degraders, and other complex modalities that go well beyond antibodies will demand new data strategies and experimental design.
Key takeaways
- You can't have a successful generative model unless it understands the predictive nature of the system it's trying to model. The best models translate between generative and predictive really well — generation without validation is incomplete.
- Library design is undergoing a paradigm shift: from pan-target to antigen-specific. Where in vitro display libraries were built to bind a plethora of antigens, in silico design now enables generating libraries against a specific antigen.
- The real opportunity for AI in pharma isn't just molecules, it's also processes. Automating data processing, scripting routine tasks, and streamlining experimental planning can deliver tangible gains while molecular design models continue to mature.
- An ML model is only as useful as the assay it's trained on, and if that small-scale assay doesn't translate to development-scale behavior, the model won't either.
Full interview transcript
Nicola: Hi Steve, great to have you here at Models & Molecules.
Steve Comeau: Thank you, Nicola. Thank you for having me.
Nicola: I want to start talking a bit maybe about your background because I think it’s quite unique. You come, of course, from protein docking as a background, but then you moved more towards machine learning. That’s quite unique and might give you some unique insights. So, I would be curious, first of all, to understand how the transition happened, if you can call it a transition, and then afterwards, indeed, what could be those insights that you might be more familiar than others with.
Steve Comeau: I think the transition really happened when I went into industry. So, as you mentioned, I spent my grad school days in the Vajda Lab at Boston University, learning from Sandor Vajda and Carlos Camacho, who are really pioneers in that docking field. And during my time there, I was actually the first author on ClusPro, which was the world’s first protein-protein docking web server.
So, I spent a lot of time thinking in grad school about protein-protein interactions, binding events, how to think about binding events structurally, and designing platform technologies to enable this type of research.
After grad school, I went into industry, I joined a company called Dyax, and I saw in one of your earlier episodes that you had Rene Hoet up there, and he briefly talked about phage and some of the bioinformaticists that were working on our informatics systems, and I was one of those guys.
But at Dyax, they had one of the leading technologies for phage display. So, they were leaders in the antibody space. And learning from Andy Nixon and Bob Ladner, a couple of titans in the field, I picked up a lot of antibody sequence structure function analysis.
After Dyax was acquired by Shire in 2016, I found myself at a small biotech in Rochester, New York, called Vaccinex for a couple of years. But after that, I ended up at Boehringer back with Andy again. And here I was actually able to apply a lot of my structural biology stuff from grad school with my antibody sequence structure function and engineering knowledge to think about computational ways of bringing antibodies forward. So really, it’s more of the antibody piece that I think is linking the computational to my research there.
Nicola: I can see that. And also, very interesting that indeed you went back, you know, like this mentor relationship that at some point comes back. I think that’s also interesting to see it happening more than once with different people that I’m talking to. You had that specific background, right? So, in protein docking, and I’m just curious, where do you see that experience that you made there can be applied and can give you some extra insights when you’re thinking nowadays about machine learning in industry perhaps at Boehringer or others?
Steve Comeau: Thinking about binder design and antibodies, I come from a background of learning about antibodies and thinking of them from a structural standpoint, thinking about library design, and really thinking about that antibody sequence structure function relationship. Spending time thinking about library design and naive libraries and how to generate human binders became a passion of mine. And now that we’re getting into an area where we’re thinking, you know, beyond the in vitro approaches, beyond the in vivo approaches and getting into the in-silico approaches, I think it’s really important to realize that we have this third wave of antibody generation that’s coming in. So back in 2012, Andy asked me, if I were to think about generating antibodies in silico, how would I go about doing it? And I went back to my days in the Vajda lab and thought about how they were trying to do small-molecule docking. And what they did was fragment-based discovery, and they would dock small fragments of relevant pieces of molecules to an antigen and then try and string them together. And thinking about that from the antibody standpoint, I said, what I would probably do is dock amino acid side chains to the surface of an antigen and then try and find ways to dock antibody structures into that network. And interestingly enough, as this field was picking up space, I wrote an algorithm internally at Boehringer a couple years ago. I called it ALGEBRA: atomic-level, geometry-based, rational antibodies.
Nicola: It’s a long name.
Steve Comeau: It is, you know, I tried to fit it into the acronym.
Nicola: But it’s cool. Yeah, it’s definitely a cool acronym for sure.
Steve Comeau: Yeah, so I’ll be presenting that at ACS later this month. And really, what it is, it’s a proof-of-concept framework to show that you can generate novel antibody sequence given the structure of an antigen. From a pseudo code standpoint, it really looks a lot like what David Baker was doing before diffusion with his RIFDOCK algorithm or what Po-Ssu Huang was doing with his Sculptor algorithm but now with antibodies instead of helical bundles and those types of scaffolds. And while we didn’t really have success with it internally, what we were trying to get at was what are the signals that we’re trying to chase within the antibody realm?
Thinking about antibodies themselves, I think there’s lot of known knowns that we can leverage into these types of designs. We know a lot about antibody sequence and, you know, humanness. We know that the sequence space is incredibly large. I think the sequence space is often described as larger than 10 to the 50, five zero, which is, you know, the number of stars in the universe squared, right? Like it’s a really big number. So, trying to compare and contrast that difference of what we know about antibody structure against the vast sequence space, I think, is the primary challenge of what we’re trying to get at.
And those machine learning models that are now generating novel antibodies, what are they actually doing? I think we can take a look at the diffusion-based architectures and see what are the signals that they’re trying to chase. And what David Baker was doing when he released RFantibody was it was a two-step process. The first one was diffusion where you would generate novel backbone structure. And then the second part of it was MPNN, where given a structure, you would generate sequence. And that’s a two-step process. And obviously, there’s going to be some dependencies on each. And one of the things that I thought was really neat about ALGEBRA was we were actually trying to solve multiple residue positions at one time, multi-position optimization. But really, what they’re trying to model here is biophysics. And I can have questions on what is the impact of novel CDR design. We can take a look at the CDRs of antibodies and say heavy CDR3 is kind of like the wild, wild west, right? Like we know that there’s a lot of sequence and structural diversity there. But outside of heavy CDR3, I think that the structures are often really well-defined and correlated to the germlines that you’re using. And I think the work out of Roland Dunbrack’s lab first with Ben North and then with Jared Adolf-Bryfogle with the PyIgClassify stuff and moving that really shows the correlation there.
When you’re thinking about generating novel structure, but some of these CDR structures are really tied to the germlines, what’s the importance there? What are we hallucinating? What are we trying to get at? And I think the answer is, a lot of it is we want it to look human. We want it to be along those lines of those known knowns. So, chasing those signals, I think is going to be really important. And I think the models that are being successfully used right now are trying to model biophysics more than they’re trying to model data.
Nicola: I think you touched upon a series of, I think, at least to me, quite interesting concepts, right? You started off with the old sequences, this huge sequence space that we are looking at. My understanding, at least, is that space is huge and is mostly a plane with a few peaks, very sparse, into that large search space. So that makes it, of course, very difficult. And how do we explore that space? I think it’s a question. And maybe that goes back to also what are the signals that we should use to explore that space and climb maybe in that space. So that’s one thing that I think you mentioned before. Do you have any idea regarding that? What would be your go-to there?
Steve Comeau: I think when you’re taking a look at a search space that’s so large, the question is, are you going to invest efforts into mining through that data to find the antibody that you’re looking for? Or is that really just telling you that, you know, from a design standpoint, you have a lot of freedom to operate? And I think it’s more the latter. I think you can traverse that search space a lot more when you’re thinking about it from a design standpoint, you’re not trying to sift through millions or billions of sequences saying, you know, are you my mother to the antigen over and over again.
And I think that that relationship between the sequence and the structure, you know, we can actually take this back to docking, right? The two fundamental questions of protein docking were, do these two proteins interact? And if so, how? And I think docking gets to that second question of the “if so, how” and it’s not answering that question of “do two proteins interact”. With the antibody space and no co-evolution between antibodies and their antigens, I think this pushes us towards the co-folding space a lot more so if we think about the algorithms that we’re using in drug discovery we’re often breaking them into two different categories, right? Like, I mean, I know there’s a lot of nuance to this, but models are often generative or predictive. And I mean, predictive, we can get into classifier or discrimination-based, I’m going to lump it in largely for simplicity.
But when we think of the properties that we’re trying to find, right? We have single body problems, which I think relate to the antibody, and we can be talking about developability issues, or we can talk about multi-body problems, which I think is the binding piece.
So, you know, if we think about those two aspects in a two-by-two matrix, generative and predictive and multi versus single body problems, the multi body problem is in the co-folding space is that predictive binding piece. What does predictive binding look like? You know, here we’re going to be taking samples of antibodies maybe from a next gen sequencing experiment and trying to put them into a co-folding and see how well they do against their antigen. What are the confidence metrics? Can we predict whether these two things bind? Can we rank order them? Can we actually predict an affinity? I think there’s some level of accuracy that the field is trending towards, but not quite there yet.
So, as we’re applying this to real world problems, you know, we can break part of it down into the generative binder piece, but I think the predictive binding piece is just as important because that’s validation. And I think the best models that we’re using actually translate between generative and predictive really well. I think you can’t have a successful generative model unless it really understands the predictive nature of the system which you’re trying to model.
Nicola: And going back to the co-folding, right, because that’s interesting to me. Maybe before we actually get there, I think co-folding is one of the key principles that I think brought us the big advances that we have seen in the past years, at least in terms of folding prediction and structure prediction and the such. Maybe just a question, did that surprise you that that advancement that we have seen, you know, since AlphaFold 2 and something like that? Because I know a lot of people working in that field for a long time, they were quite surprised. So, I’m just curious if that came for you as a surprise or if you saw that coming.
Steve Comeau: I saw it coming. I think it makes a lot of sense the direction that we’re going. And I think the importance though is, you how well are these models modeling the biophysics and what are the data that they’re relying on? I think a lot of the initial co-folding was really dependent on multiple sequence alignments and homology and with antibodies as I said before, they don’t co-evolve with their antigens. I think a lot of the co-folding pieces actually lagged behind on antibody antigens for that reason, probably because they were focused more on data than biophysics at that point. But I think as these models are advancing, we’re going to see less reliance on those multiple sequence alignments and better physical based descriptors.
Nicola: I think that’s exactly true, right? That’s also why at least some time ago, all these different models that for instance, the Charlotte Deane Lab’s and other brought into the space were better than actually the different AlphaFold, mostly because indeed, there was that principle that antibodies don’t really co-evolve with their antigens because they’re not there to start with in that evolution space. So, I think that was definitely the case then.
But right now, indeed, you mentioned that there are new models that are coming up, some that come to mind, right? There is Boltz, there is Chai, there are a few of those ones coming up. And I think maybe one interesting part about that is that there is a tendency for these models to become open source. So also, let’s say the use of those models, I guess, is being democratized. What do you think about that? So, would that actually improve and give us an acceleration?
Steve Comeau: I definitely think it should help accelerate the rate at which we’re discovering novel binders. I do think that the models that are doing really well are probably going to be commoditized for commercial purposes. I think it makes a lot of sense.
You know, I think I would ask the question though, like what is going to be the ultimate value impact to biopharma once these models are getting deployed? How good are they going to be? What processes are they going to impact? If we’re thinking about the triangle of better, faster, cheaper, I actually think that ML and AI are one of those rare industrial opportunities where we can push in all three of those directions.
The way that I see this unfolding first is probably more on the speed aspect of things. Can we generate tool molecules to probe biology faster? How well are these models going to be in delivering better molecules? I think that’s a separate tangential ask, right? Like you can generate for biology, you know, that’s the binder thing. And I think that goes with the two-by-two matrix that we set up in the last piece, where, now, trying to engineer for quality and get that translation from research to development, I think that’s going to be important. But yeah, the access and the cost, I think that’s something that we’re seeing play out in real time right now and with numerous companies coming up with new technology, it’s a rapidly evolving landscape.
Nicola: So, we need to see where these models that are coming out or they’re already out really add value to pharma. And in a way, this to me, at least plays around two aspects is indeed how pharma can improve on them, right? So, what type of data can be used to improve them? And the other, on the other end of the spectrum, what this model can add in terms of value in the end, as we were saying, which is ultimately valuable for pharma. About these two topics, what are your ideas? What do you think we can use to improve those models and where the value lies in the end?
Steve Comeau: What can we do to improve the models? That is going to be a data question, right? And you are in the software world and very comfortable with production environments, development environments, and test environments. And I would say that, you know, looking at the way that we operate in pharma, we have, you know, very similar. I would say that our production environment is our pipeline. And I think those are the existing processes that we have that we use to deliver drugs for patients.
What would a development environment look in? I think that this is where you start bringing in the ML and the AI, and I think you let it help see is it as good as your current process? Probably not, right? You’re going to want to have opportunities where this can fail without consequence. If you get something out of it that ultimately helps a project, then I think that’s where you start achieving your acceleration and your timelines. So, I’m thinking about it more in terms of how you deploy it within an organization as opposed to on the data piece of things.
And you could probably speak to this a whole lot more, you know, working with the companies that you do on how data is structured, how organizations have problems with their data, tracking down old data in Excel files or PowerPoints, a lack of data integrity across time. You know, it’s really dependent on the people that are uploading the data. There’s missing metadata associated with experiments. And I think that’s really hard to learn from within a firm environment to try and make models better.
How can the existing data be used if it’s not necessarily formatted or retrievable in a way? I view that the existing data that pharma’s may have if it’s not in that type of format, I think that that’s an appropriate spot for a testing environment. Go back to old projects and take a look and see how this new tech that you’re trying to bring in would perform on things that you’ve already done. And so now you’ve set up an environment where you can bring in new technologies, you can design experiments, see how they work, see how they impact your pipeline. You can go back to old data cases and take a look and see, you know, are you doing any better? And then, you know, once you’re sure that it is at least on par with your technology, then you can bring it up into your production environment and use it, you know, as part of your pipeline.
Nicola: Regarding the data, I totally agree, right? That’s something that we also see all the time. There is definitely, well, let’s put it like that, that I think that AI, one of the things that he did is of course accelerating, but he’s also forcing pharma to think differently or let’s say in different way regarding data and how to process data and how to manage data, right?
I think this is definitely one of the things that we are seeing in the industry overall. And for the better, I think, because it’s actually making sure that the data that is there is more valuable today, but also for tomorrow. Because one thing that we have seen is indeed that, you know, if look back, we had a discussion in the past, really with people saying, well, how valuable is the data that we historically have because it’s missing a lot of the metadata. And there are a lot of questions about how much that data is going to be reused in the future. And I think right now everybody’s very cognizant of that. And hence, the data that’s generated today is going to have all that information for the future as well. So, I think that’s definitely a trend.
The other part of the question that I was asking is indeed, OK, let’s suppose that we have this beautiful model. Where do you see really the value of those models eventually when they get applied? Because I think, again, talking with different people, I hear these questions, right? So sometimes I hear people saying, “well, actually, the objective of this year for me is to understand which type of data I need to generate again and which type of data I don’t need to generate in the lab. But I can actually either predict or I can rely on in silico to actually skip those data sets”. So that’s something that I hear out there on the one hand. And on the other hand, I hear people that are saying, “well, actually, we don’t want to rely too much on, let’s say, predictions. We want to use them as a guide for selection”. So, it’s more like a guiding process and an acceleration in that respect rather than a full blown down to the number prediction. So, what’s your take on those aspects?
Steve Comeau: I probably fall in that second camp. You know, when we take a look at machine learning and AI, I mean, really, what are they? They’re deep probabilistic models. And what are we trying to do in industry? Minimize risk. Ultimately, we’re trying to move models forward, but every decision that we make is effectively a risk-based decision. Is this the right sequence to move forward? Is this going to behave properly?
I view a lot of the in silico as companion for the real-world data. And I’m probably one of the more skeptical modelers out there. I’ve always used it as a tool to help you know, guide design of experiment. I’m less familiar with the quality-based things, but I do understand that the translation of research to development is a really big topic.
Nicola: What actually makes you really skeptical? I’m just very curious.
Steve Comeau: Well, from my background of protein docking, that’s a cool technology and maybe you can get some insights into how your two molecules interact, but the success rates are not great and you’re often comparing against a co-crystal structure. And if you have that co-crystal, why do you need to do the docking? That’s a little bit of why I’m skeptical about it. I do think it’s a good design tool and a good piece to have to visualize things, but I’ve always been more about the real-world aspect of things.
So going back to the developability piece, we’re able to screen to some degree in research, but we’re often working with a lot of molecules in small scale assays. And in development, they’re working with very few molecules and much larger quantities, much larger assay systems. And I think the more important part instead of the machine learning is what is that translational piece between research and development? What are the signals that we can send from research to say that we’re going to have a higher likelihood of success in a development environment? And I think, you know, ultimately what we’re trying to do is find what are those small-scale assays that are descriptive of development behavior.
And I think that the way to get there from the machine learning point of view is to start looking at models that will be able to describe the behavior of those small-scale systems, right? You know, this is probably where you’re going to see more in the language model space as opposed to the structural model space.
Nicola: Do you have already an idea of what those could be, this small scale that you were mentioning or is it something that you’re working on?
Steve Comeau: I think right now there are some levels of accuracy and predictive success that we can have with computational sciences. The things that we can predict reliably I would say are more along the lines of single body interactions at this point, single body properties and descriptors along the lines of aggregation propensity, surface patches. We can do sequence-based metrics to see, is there any deamidation or oxidation stress that we can have? We can, to some degree, predict expressibility or monomericity of proteins. But, you know, the systems that we have in development, they do a lot more with formulation screens and process sciences than what we do. So ultimately, you know, I’m thinking what we’re hoping for is a package that can show us that the risk profile is minimized within the development organization. And what are those assay screens that we can have that give us that confidence?
Nicola: Do you think that identifying those assays is true bottleneck to make an improvement there or are there other aspects that you think we need to tackle to get there?
Steve Comeau: It’s a good question. I mean, as you progress from research towards the clinic, the systems which you’re operating in become far more complex. Within research, we’re working in small-scale with cells. With development, they’re doing a lot more on process development. But at the downstream end of this, we’re looking at the most complex system, people. And the signals that we’re trying to chase in each of those systems are very dependent on where we’re going. So, for instance, I’m using people as an example because we look at immunogenicity, for instance, and I’m of the belief that all protein-based therapeutics have some level of immunogenicity. It’s really a risk mitigation task that we’re doing, and it’s going to be really dependent on the system in which we’re putting the molecule into. And trying to model systems, I think, will be really interesting.
I would say I think it’s important in development organizations to be thinking about how they can describe the molecules that come into them and how they’ll behave in the processes and systems in which they operate.
And I think that development organizations will have a lot of data to train that, but again, not on a lot of molecules. And I think that as we start getting into molecules, you know, we’re seeing far more complex molecules come through. We’ve spent a lot of time thinking about this in terms of antibodies, but I also know that antibodies are becoming a far less part of most pipelines. We’re getting more and more complex therapeutic concepts and more challenging molecules to make.
Nicola: Actually, I had one question, I was really looking forward to hearing your opinion because I think there are a lot of the molecules that are going towards the clinic fail indeed in those later stages, right? So, when there is a toxicity problem or things among those, right? And I know that you spent a quite deal of time thinking about binding, right? So where is the balance there? Do we need more binders? Or do we need to build models that fix more of the current binders? Or what’s your take on that spectrum?
Steve Comeau: That’s a great question. And I think it comes down to understanding the molecules and the pathways in which you’re trying to modulate. I don’t have a lot of experience in understanding outputs from the clinic and why molecules fail there. But I do think that getting earlier senses of where molecules could fail at the research stage and building that downstream success into your molecule before you get it into development is going to be critically important.
I think that’s going to be really important as we think of the complexity of our molecules. T cell engagers, activatable drugs, protein degraders, there’s a whole new class of molecules that are coming out there that aren’t just antibodies and thinking of the downstream biology for the molecules that we deliver I think it’s a challenge within the field.
Nicola: What you’re saying made me think about is that there is still value or at least there is perhaps a concrete value in expanding our set of binders because that expands also the pool that maybe we can drive from and indeed understanding the complexity of toxicity or those aspects that involve more complex perhaps pathways downstream, maybe higher challenge than trying to find a larger set of binders in which we can actually try to hit the correct molecules.
Steve Comeau: I agree, from the design aspect of designing binders. Before, when we were talking about in vitro and in vivo systems, those systems were actually what was defining the epitope. I mean, unless you use creative ways to do phage selections or pannings or immunizations, now when we have the in silico ability to kind of specifically tailor our binder to a given epitope, it gives us a lot more creative freedom to think about the exact mechanisms that we’re trying to drive. And hopefully, you know, we can leverage that into safer drugs in the future.
Nicola: And indeed, talking a bit about the future, and you said indeed, we are looking at more modalities. This is definitely one of the trends that we are seeing. Is there anything else that you’re seeing in the industry regarding the trend that we are entering?
Steve Comeau: You know, maybe we can rephrase the question a little bit. I think a trend towards new modalities is going to be part of every pipeline. The question I think that we have to get to is what do those molecules look like and how good are they ultimately going to be? You know, as we’re engineering complex molecules, I think there’s a lot of open questions about how they behave in development. And as we’re advancing our quality-based models in research, those I think are often describing antibodies. I don’t think that there are very many models out there right now, if any, that could say here’s antibody A, here’s antibody B, can we merge this into a bispecific antibody with success and predict that it’s going to be developable and well behaved downstream? So, what I see coming in the next couple of years is probably efforts towards describing those quality-based methods on those complex therapeutics. And I wonder, what are the ultimate successes that we have that that we’re going to be? They’re probably lacking a lot of data, right?
I think you would have to really do some data capture and some smart experimental design to really get into it. I think if you were to take a look at the bispecifics, for instance, and train on bispecific data, there’s probably an overabundance of anti-CD3 antibodies within that bispecific set. And they’re probably based off of a couple of the same architectures. So, you know, what can you really learn when your data set is skewed in that way?
Nicola: That I can totally see and maybe that actually brings back another topic. Binding design, I think, was one of your original interests, where you start looking and thinking a lot through. And maybe there you can tell us a bit, what’s your thought about binding design and how we can expand those pools and what’s valuable actually there?
Steve Comeau: As we are getting into a realm of novel binder design, I actually see a paradigm shift in the way that we think about library design. Back in the days of in-vitro display, the goal was to have a well-behaved library that could bind a plethora of antigens. It was meant to be pan-target. And now what I think we’re seeing is the ability to generate libraries against antigen specifically.
And, you know, from a combinatorial standpoint, if I wanted to make a thousand antibodies and test all thousand of them, I could either do this and do it across 10 plates or 11 plates and do it directly in the lab. Or I can build a library of 10 to the 3 heavy chains and 10 to the 3 light chains. But within that space, you end up actually getting a library of 10 to the 6, so a million antibodies where only a thousand of them are the hypotheses that you wanted to test. I think that what we’ll see is a change in DNA synthesis.
I think that library design and library synthesis is going to be a really important part of biopharma, at least as far as these technologies are able to generate binders at that scale. Some of the more robust technologies, I would say, are able to generate binders in plate-based formats. So, that actually, I think, is a real big time save, and I think that’s a big paradigm shift within the discovery process.
Chai, I think, last year was the first one to say that they could do antibody discovery in 24 well plate scale and had reasonable success with that. I think that’s amazing because if you’re in a position where you have to design a library against an antigen of interest, you have the time of the design, you have the time of the DNA synthesis, the time of the library build. And then you have to go and do your normal process of panning, screening, and all of the downstream piece. It’s actually not faster than your typical process if you have a good library. So, being able to get to that point where we’re balancing better, faster, cheaper, the technologies that are library based right now aren’t necessarily going to be faster per se. Maybe they can give you some benefit in opening up to a difficult-to-target antigen. But if it’s difficult to target, you may have difficulty producing that antigen. So, then it gets down to, you know, screening. Is it going to be a cell-based screening format? How do you actually pan that library? I think is going to be, you know, an interesting challenge and how you ultimately effectuate the ML design of antibodies.
Nicola: Indeed, because I think you raise a good point, right? So, I think there is a lot to be gained by combining library design, intelligence library design with the whole panning. Where do you see, because indeed you see, well, we are still using all sorts of technologies that at least may be as good as, where do you see maybe two things? One is the largest advantage now and perhaps in the future in this technology applied for library design and such. And the other one, where do you see the largest bottleneck at the moment for this approach?
Steve Comeau: At this stage right now, I think the technologies that are available to us are ones that I would probably be screening in plate-based format. And the question is, what is the size of that screen? I’m not sure, you know, from a resourcing standpoint, if it makes sense to be building those libraries.
I’m really interested in taking these technologies and applying them to some of our early-stage programs where we’re looking to get questions on “is the biology right?”, “can we generate some tool molecules that may not necessarily meet the quality-based standards that it takes to bring an antibody to development?” But, you know, so, ultimately, there’s questions on speed. Maybe the speed aspect isn’t as fast on the molecule piece, but maybe you can get to biological questions faster in the long run. And that right there is the real signal that I’m trying to chase.
We can talk about chasing signals in biophysics and residue-residue interactions and what binders are based on. But the metrics that really matter within pharma are going to be one of that better, faster, cheaper. And how do we get some of these answers quicker?
Nicola: And I think this goes back to what you were saying before, which by the way, I found it very interesting. The mechanisms that are established in a software company in which before you shipped your software in production, you want to have the opportunity to test it, to improve it, fine tune it, which in a way I can imagine similarly happens as you were mentioning for protocols, let’s call it protocols in pharma, right? In which you have a certain way to deliver drugs, but you want to improve that, but you don’t want to start improving in production with your current pipeline. You want first to establish it and then move into production.
However, in pharma, I know that there are very tight and short deadlines. So, I was wondering indeed, whether you think it is possible to achieve that type of development in pharma, especially now that perhaps there is more to test with models and things like this?
Steve Comeau: I actually think that there’s significant opportunity for this. And I know that we’ve actually spent a lot of time thinking about molecules themselves and processes to generate new molecules. And we’ve talked a bit about quality of molecules, but there’s a whole other aspect that we didn’t really touch on with respect to development versus testing versus production environments. And that’s the actual process itself that you have within drug discovery. It’s the pre-work that you’re doing before experiment. And I think that there’s significant opportunity for ML and AI models to help with process as well, not just part of the molecule discovery piece.
So, as we think about bringing our models from a development type environment where it’s a little bit more exploratory, it gives you freedom to play around with things, into a production environment, which is your pipeline and you have less tolerance for failure. I think there’s opportunity for that with the molecule itself but taking a look at the processes and the work that scientists do. I mean, I made a large part of my career by being able to script and just automate tasks and generate data and process data. And I think there’s going to be real big opportunity in the ML and AI space for process-based improvements as well. It’s not all just molecule.
And being able to modify your process on the fly, right? You know what you’re doing, you know a little bit more concretely what it is that you’re trying to get to for your end state. And I think that that is going to be a lot more easy to effectuate than actually having a model that understands a molecule more so than anything.
Nicola: Maybe this episode will go viral, but I knew that you went viral before already because you told me a very interesting story about it. So, I’m just curious if you can share with the people that are listening to us how you went viral.
Steve Comeau: Yeah, so back in 2014, 2015, when there was the big Ebola outbreak, I was working at Dyax and we wanted to see what we could do with our technology to see if we could help people afflicted with the disease.
So, I ended up finding myself at an Ebola conference in Paris in May of 2015. I spent the weekend there after the conference, went to the Musée d’Orsay where I saw this Renoir painting there called La Balançoire. And I was shocked when I saw it because there’s some guy behind the tree that looked like I was looking into a mirror. So, I took a selfie in front of it, put it up on my Facebook with some cheeky little comment about being timelessly creepy, and, you know, went home, went to bed, flew back the next day. And when I landed, my phone was, you know, I had a whole bunch of texts, and it turned out that somebody took the image from my Facebook, put it up on Imgur where it got, you know, two and a half million views and it was trending in Reddit’s top 10. So, the moral of the story is I went to an Ebola conference, and I came home viral.
Nicola: Fantastic. That’s really like, I hope that this episode will get some of the same attention you got from that episode.
I just wanted to give you some time to answer what I usually ask at the end of every episode which is a bit of an open question. Which is, what is something that you believe to be true but most of the people in the field wouldn’t necessarily agree with you or push back on?
Steve Comeau: So, I wouldn’t say that it’s necessarily something that I would say is true. But going back to being a skeptical modeler, I never really understood the reason for wanting to get accurate heavy CDR3 predictions and why that seemed to be a gold standard for so long for single antibody modelling. Obviously, I think I see downstream implications of that for docking-based technologies where, you know, you’re not introducing a lot of structural flexibility. But realistically, I think that those loop confirmations are somewhat dependent on the antigen, and if you don’t have the antigen there then how do you really know if that’s correct?
Nicola: I think that’s an interesting perspective and I think people would have different opinions on the topic. With that said, I’d like to thank you for participating in this episode of Models & Molecules. It’s been very nice to talk together and, well, I wish you a good continuation.
Steve Comeau: Thank you so much for having me. This was fantastic and I really enjoyed it.