MLOps: The unsung hero of AI-powered biologics discovery

Before researchers conducted so much of their research in silico, the lab notebook was the quintessential source for all of a scientist’s experimental conditions, variables, and results. Critically, the levels of organization and detail recorded before, during, and after an experiment could make or break an entire study.

Fast forward 20 years, and a great deal of preclinical and other biomedical research now occurs using complex, rapidly evolving machine learning (ML) models. These models are developed by diverse teams of scientists using enormous, ever-growing training datasets. AI is no longer a buzzword; it has become a critical driver of drug discovery and development.

Scaling AI in life sciences and pharma

Today, AI contributes to 16% of drug discovery efforts—a figure projected to grow by an astonishing 106%1 in the next three to five years. This surge is driven by advancements in domain-specific large language models (LLMs), generative AI, and deep learning, which are revolutionizing biopharma R&D. By harnessing these technologies, researchers are accelerating timelines, cutting costs, and delivering transformative insights with unprecedented speed.

The numbers tell a compelling story—the global market for AI in drug discovery is forecasted to grow from $1.5 billion today to $13 billion by 2032.2

However, as AI scales across the industry, it introduces challenges around the consistency and reproducibility of ML models required for the scientific method.

This is where MLOps makes all the difference.

What is MLOps?

MLOps is short for Machine Learning Operations, which functions as a marriage of machine learning and DevOps, or the processes, practices, and tools used to continually revise and update software. More simply, MLOps provides a framework and interface for ML models, their developers, and users over the entire ML-model life cycle (Figure 1).

What is MLOps

Figure 1. MLOps phases integrate machine learning models and training with the operations required for continuous software development and deployment over the entire ML-model life cycle.3

MLOps: the basics

MLOps functions on four basic principles:

  • Reproducibility and versioning: Track changes in code, configuration files, and infrastructure (e.g. cloud services or resources) to guarantee the reproducibility of results
  • Monitoring: Monitor training and inference metrics such as accuracy, F1, predictions for a holdout set, GPU/disk use during training, model initial and final weights, and data or model drifts
  • Testing: Validate input data quantity and quality and future schema (predicted values), data generated during model runs, and compliance with best practices
  • Automation: Automate the ML data, model, and code workflows

Effective MLOps strategies are focused on managing the challenges of big data and continuous integration and delivery at scale. Many MLOps work best by separating ML model tracks into training and serving pipelines to ensure consistency and reproducibility across an organization. The level of MLOps automation used to mature an ML process can also vary based on the needs of the team, infrastructure or skills limitations, and the scale of deployment.

The MLOps framework has eleven key components or functionalities that work in tandem to continuously automate and deliver ML models (Figure 2):5

  1. Experimentation
  2. Data processing
  3. Model training
  4. Model evaluation
  5. Model serving
  6. Online experimentation
  7. Model monitoring
  8. ML pipeline
  9. Model registry
  10. Dataset and feature repository
  11. ML metadata and artifact tracking

While it is possible to combine individual processes to cover all of these functionalities, most teams use a single ML platform to implement MLOps components.

Eleven functionalities required for MLOps

Figure 2. Eleven functionalities required for MLOps.5

Implementation challenges

Many of the most common hurdles and pitfalls of MLOps implementation should look familiar to teams in the ML space: much of ML is still quite experimental, which can cause infrastructure, computing, and other strains until systems are fully developed. In fact, 87% of data science projects never reach production6 and 77% of companies struggle to implement big data and AI projects.7 In particular, teams can struggle with:

  • Data management: Managing enormous amounts of potentially sensitive data of varying quality
  • Model deployment: Optimizing models to address compatibility and scalability issues
  • Collaboration: Maintaining consistent and frequent communication within MLOps team and between MLOps team and ML model users
  • Intellectual and infrastructure drain: Attracting and retaining properly trained employees and investing in the infrastructure required for big data and ML operations
  • Model validation: Monitoring ML model performance can significantly drain resources

Rather than building an entire MLOps system from the ground up with individual processes, many, if not all of these challenges can be overcome by outsourcing MLOps to an ML platform designed to manage all MLOps functionalities, tailored to drug discovery. Few companies currently have the intellectual and infrastructure resources available to build a de novo MLOps system.

Instead, most pharmaceutical companies leverage the interfaces and MLOps from AI companies that specialize in ML model development. This allows teams to integrate AI into their brand and expertise by outsourcing tools designed or optimized specifically for their use cases.

Why MLOps matters in biologics discovery

MLOps allows data scientists to create and optimize ML models through one interface while allowing users to run experiments and track variables and results using trained models in another connected interface. This eliminates any uncertainty regarding the specific model and data used to train a model or run experiments. This connection of model user interfaces is particularly important because data scientists developing and training ML models aren’t always working in the same room, city, or country as the researchers using the models.

By standardizing the practices and interfaces used to create, optimize, deploy, and use an ML model, teams can overcome many of the hurdles researchers face when leveraging ML models at various scales. Based on the needs of a specific team or organization, MLOps can more efficiently train, package, validate, deploy, and monitor ML models to maintain consistency, regulatory compliance, reproducibility, and reusability.

MLOps provide value to pharmaceutical companies specifically in the following ways:

  • Testing ML models at scale: Models are continually published and made available, allowing users to quickly test, evaluate, discard, or improve models to keep a competitive edge.
  • Maintaining compliance: Tracking model version and AI provenance is critical to meeting future compliance demands and delivering drugs to the market.
  • Fostering collaboration and an AI-first culture: Data science and lab work merge seamlessly to foster innovation in both disciplines.

With MLOps, pharmaceutical companies can not only streamline operations but also unlock AI’s full potential to drive innovation, improve compliance, and accelerate breakthroughs.

Connecting lab and data scientists

ENPICOM provides data scientists with a private and integrated MLOps environment solution, complete with an intuitive UI to track experiments, store artifacts, register models, and compare results. How does it work? Let’s explore an example: imagine lab scientists in your company have extensive thermostability data8. The process unfolds in four simple steps:

  1. Retrieve: Gather the most current thermostability data and metadata
  2. Train: Train the model while tracking the artifacts, parameters, and metrics of newly created models
  3. Track: Compare model performance, parameters, runtime stats, and more with customized visualization tools
  4. Run: Register the best model with a few clicks to make it available to lab researchers

Lab scientists can select candidates, pick the new thermostability model from a list, and instantly see predictions as newly created metadata. No configuration is needed—the platform automatically handles inputs and outputs.

By standardizing ML model deployment, ENPICOM enables data scientists to efficiently develop and refine models while empowering lab scientists to apply them effortlessly. It’s the perfect bridge between data science and the lab.

Contact us to learn more about how to integrate ML into your workflows and ENPICOM’s approach to MLOps in biologics discovery.

References:

  1. 2024 Global Life Sciences Sector Outlook. deloitte.com. Accessed November 14, 2024.
  2. AI in the pharmaceutical industry – statistics & facts. statista.com. Accessed November 14, 2024.
  3. MLOps principles. ML-ops.org. Accessed November 14, 2024.
  4. MLOps infrastructure stack. ML-ops.org. Accessed November 8, 2024.
  5. Salama K., et al. (May 2021.) Practitioners guide to MLOps: A framework for continuous delivery and automation of machine learning. Google Cloud. Accessed November 8, 2024.
  6. Why do 87% of data science projects never make it into production? Venture Beat. Accessed November 8, 2024.
  7. NewVantage partners releases 2019 big data and AI executive summary. BusinessWire. Accessed November 8, 2024.
  8. Manage your ML lifecycle. ENPICOM. Accessed November 8, 2024.