26 Mar 2024

Reducing the barriers for immune repertoire research: An in-depth interview with Dr. Felix Breden

Interview conducted by Dr. Henk-Jan van den Ham

In the rapidly evolving field of immune repertoire research, ensuring data reproducibility and data sharing are crucial for advancing scientific knowledge and promoting collaborative breakthroughs. Reducing barriers for researchers to access highly curated data eliminates repetitive analysis steps and facilitates innovation and progress. We had the honor of delving into this subject with Dr. Felix Breden, Professor Emeritus at Simon Fraser University, and a distinguished member of the AIRR Community. The AIRR Community is a research-driven group that organizes and coordinates stakeholders in the utilization of next-generation sequencing technologies to investigate antibody and T-cell receptor repertoires. In this interview, we discuss the significance of data standardization, sharing, and reproducibility in immune repertoire research, the challenges and advancements in repertoire profiling, and future developments.

How and when did you get involved in the immune repertoire field?

I first started looking at this field in the early 2000s. My wife, Jamie Scott, was an HIV vaccine researcher. She had sequences from HIV patients and controls. At the time, a significant question was what broadly neutralizing anti-HIV antibodies looked like, and whether people who produced them were at risk of developing autoimmune diseases. This is because the few broadly neutralizing anti-HIV antibodies available at the time appeared to resemble autoimmune antibodies with long CDRH3s.

As an evolutionary geneticist, I worked mostly on non-model organisms like beetles, fish, and corn borers, but I had some experience in sequence analysis. Jamie approached me for help with sequence analysis, and together with a team of talented students, we went ahead to gather and compare the sequences of all broadly neutralizing anti-HIV antibodies, other anti-HIV antibodies, as well as other antibodies targeting infectious and autoimmune diseases.

I was shocked that we had to have students manually enter sequences into the database from published literature, as they were not available in any usable format. This highlighted the tremendous difficulty in obtaining and comparing these sequences, and the significant amount of work individual researchers had to put in.

Being an evolutionary geneticist, my interest also naturally gravitated towards genes. I began digging into what we knew about the germline genes responsible for coding these antibodies – the V-, D-, and J-genes, from the immunoglobulin heavy chain, and the Kappa and Lambda light chains. To my surprise, there was only one full sequence of the immunoglobulin heavy chain available at the time. It was put together from large insert libraries from three different individuals and was therefore a haploid representation of 6 chromosomes. It was clear that this locus had many insertions and deletions, and no one was dealing with them.

These two paths – trying to compare expressed antibodies and analyzing germline genes – made us realize how difficult it was to obtain these sequences and how little we knew about the germline genes.

Together with my student, Corey Watson, we started to analyze the germline genes properly, starting with the V, D, and J sequences. However, we discovered that while some researchers claimed to have full knowledge of these genes, they only possessed sequences of the V, D, and J’s without any insights into copy number or upstream regulatory regions.  Additionally, they did not appreciate how difficult it was to reconstruct full haplotypes of complicated regions from short sequences, and consequently there were (and still are) some real problems with the germline gene databases. Despite these issues, some people believed that this variation did not matter due to the somatic hypermutation, which could mean that everyone would produce antibodies that they needed, despite significant genetic differences between individuals. However, this is not necessarily true, and we finally convinced people that we didn’t know everything about these genes. Corey produced the first full-length sequences of human immunoglobulin loci from haploid tissue, which are now the reference loci for the human genome.  He is now receiving funding to examine these regions properly using pull-down chips and long-range sequencing.

So that’s how we got involved in the field. Given our difficulty convincing people of the importance of germline gene polymorphisms, Jamie convinced me to start looking at expressed genes, and that’s how we got into the Adaptive Immune Receptor Repertoire community (AIRR Community) and expressed gene repertoires (AIRR-Seq data). As interest in expressed repertoires grew, researchers began to realize the significance of germline genes once again. That’s a short summary of what I’ve done in the field for the last 20 years.

Image source: https://www.antibodysociety.org/the-airr-community/

What do you think are the important recent developments in the field?

I think that one of the great developments is the AIRR Community. In 2013, Jamie organized a symposium at the Antibody Engineering and Therapeutics Meeting in San Diego on how people define clones and delimit clonal lineages. And we discovered that we all talked about these things differently and we had different databases. We realized we needed to have a common language for people dealing with these clonal lineages. Then in 2015 we formed the AIRR Community; that’s a group of immunologists, bioinformaticians, and legal experts, who are knowledgeable about legal and ethical factors involved in sharing data. We work together to create a common set of protocols and ways of curating data so that these can be shared in the AIRR Data Commons. At Simon Fraser University, we developed iReceptor which is a gateway to query these data based on specific sequences or combinations of metadata.

One of other most notable recent advancements is single-cell sequencing of course. With this technology, it is possible to sequence for each immune cell both the alpha and beta chains (T cells), or heavy and light chains (B cells), and connect these receptor sequences to the full transcriptomic profile of each cell. With tools like 10x Genomics, you can also obtain very detailed cell phenotype for each cell, by measuring the binding of different CD molecules to the cell. This is a significant development, and we can expect to see more people using single-cell sequencing. However, it is important to note that the number of cells analyzed is usually smaller due to the need to generate a large gene expression matrix for each individual cell. The AIRR Data Commons is the only public immune receptor database that can curate all aspects of single cell immune profiling: receptors, gene expression, and cell phenotype. By combining this highly detailed single-cell data, with bulk sequencing, we will be able to fully describe the expanded clones in each patient, connecting receptor sequences to the physiologic state of the immune cells producing these clones.

So those would probably be two of the most important recent developments. In terms of the future, several groups in the AIRR Community have received a NIH resource grant to integrate several databases in a manner similar to the AIRR Data Commons.  This effort will start with databases curating germline gene polymorphisms (VDJBase and OGRDB), connecting these with the receptor repertoires in the AIRR Data Commons, through the databases describing the binding characteristics of these receptors (e.g., IEDB). As an evolutionary geneticist, I’m very excited about this research because it aims to understand the link between genotype and phenotype, a fundamental question in evolution. However, understanding this connection should also lead to new diagnostics and therapeutics for cancer, autoimmune and infectious diseases.

You don’t have an immunology or bioinformatics background as you pointed out, but what do you see as the biggest hurdles in performing repertoire profiling?

For me it’s really about the availability and ease of working with the data. It appears that very often researchers have their students and bioinformaticians repeat analysis on the same data, which is a waste of time and resources. This hinders progress and makes it difficult to compare data. We are working with Type 1 Diabetes Consortium, and we see scientists working with the database of public and private clones and analyzing a public clone present in all patients. This is an example for which you need to look beyond the Type 1 diabetes data, at other autoimmune diseases and control groups, in order to properly determine what clones are public and which are private. But it’s quite an impediment to not be able to quickly compare the data. The huge amount of data currently available is just a drop in the bucket compared to what is potentially available. This creates a significant obstacle to this type of research in my mind.

You mentioned having bioinformaticians redo the same analyses. Do you think it is always required to have a bioinformatics specialist for analyzing data that lab scientists could handle themselves? The skill sets required for computational analysis are somewhat different from those used in wet lab experiments.

By redoing the analyses I mean the initial bioinformatic processing (such as immune receptor reannotation) that must be done before statistical analysis of the data. And that’s what the AIRR Data Commons aims to do: to have these initial steps done once and made available in a public database, so researchers wouldn’t have to repeat them.

And the iReceptor Gateway provides a user-friendly interface for researchers to search metadata and find datasets relevant to their study. You can find data sets from patients with autoimmune diseases and controls. You can look at the germline and infer the germline genes that this individual has for personalized genotype inferred from a huge number of expressed b-cell or T-cell receptor sequences. You can also combine data from multiple datasets and analyze them with advanced packages. We always use the term ‘reducing the barrier’ for researchers to access and analyze these data.

I believe the use of iReceptor for querying and analyzing such data has been referenced in the literature over 100 times now. This increasing attention and usage are promising. There is a huge potential for scientists to contribute their data to the AIRR Data Commons, and utilize resources such as the Gateway and from the University of Texas Southwestern Medical Center.

What is your experience with data reproducibility? Different versions of software or different approaches designed to get to the same goal will sometimes yield very different results. What is your experience there?

Reproducibility is a huge problem. The iReceptor group at Simon Fraser and the VDJServer team at UTSW have curated many data sets into the AIRR Data Commons, and in a few instances, the same data sets have been curated by the two groups. This was the case for the large number of COVID studies produced in a collaboration between several COVID research teams and Adaptive Biotechnologies company. But this situation can be used to examine one aspect of reproducibility. The iReceptor approach aims to present data as close as possible to how the researcher published it, while other groups like VDJServer, and OAS at Charlotte Dean’s lab in Oxford, in most cases reanalyze data through a single pipeline. Comparing the results from the analysis of the same data set with two different approaches, is one way to examine the robustness of this type of repertoire analysis. But of course, it is important to make sure the user is aware of that situation. The MiAIRR standards developed by the AIRR Community is the minimal set of metadata to accompany a data set, that allows for researchers to understand how the data were processed at each step, starting with preparation of sample, sequencing protocols and then bioinformatic analysis.  This is one way that a researcher can understand how the data were processed overall, and whether they want to include sets of data in their final analysis.

Overall, the heterogeneity in data annotation and processing methods is a challenge to achieving reproducibility in the AIRR Data Commons. If two researchers take the raw data and come up with different answers because they use different pipelines – we need to know about that and be able to control for that if possible. The Software Working Group of the AIRR Community works on interoperability and creating a common format to ensure that data can be moved from one analysis platform to another without recoding.  This makes it much easier to analyze the same data with different pipelines.  This is an evolving process and there is still a huge amount of work to be done understanding how reproducible these data and analyses are, and what contributes to reproducibility.

You’re also involved in the International Union of Immunological Societies (IUIS). What is their role in repertoire research and why are they important to germline references in general?

My wife, Jamie Scott, and I have been working closely with the IUIS to develop a more open and transparent community-based approach to evaluating and naming T-cell receptor and Immunoglobulin germline genes, similar to what we have done with the AIRR Community. Jamie is heading the IUIS Nomenclature Committee, while I am currently leading the Immunoglobulin and T Cell Receptor Nomenclature Sub-Committee. With so much data coming down the line, we need to coordinate multiple germline data sets to provide good reference sets for the community. The IUIS has the strongest mandate for developing and validating these resources and helping researchers understand what’s available and how the data have been validated. For example, Corey Watson is producing huge data sets of these germline genes from multiple individuals, mostly working with humans, while other groups are producing similar large data sets working with the macaque. Data on the different inbred mouse strains are also exploding, so it’s going to be a real challenge. However, working with the IUIS behind us, we’ll be able to reach out to people and explain what’s available. We hope to make significant progress with the support of IUIS.

I think one important aspect of your question is related to the importance of germline data sets for ensuring reproducibility. The first step is to collect the sequences from the patient and annotate them against the germline data sets to understand their origin. Genes that came from VDJ recombination are unique to immunoglobulin and T cell receptor loci. Therefore, it is crucial to understand this process, and without a reliable germline reference, that would be impossible. Once you have this reference, you can talk about expanded clones, but you need to know the germline genes and the mutation patterns to determine if you have five or just one clone. These references are essential, but people are unaware of what is available and about the inadequacies of some of the databases. I believe that the IUIS is the group that can address this issue because they have the mandate, and people are using this for critical medical reasons. I expect this issue to be resolved in the next year or so to the benefit of the immunogenetics community, and overall biomedical research.

What is the current state of the reference databases? How do you see these databases developing?

I believe that the International Union of Immunological Societies (IUIS) can serve as a platform for researchers to understand the advantages and disadvantages of the different data sets available for humans, chimps, mice, and other species as their germline repertoires are developed. While the new IUIS TR and IG Nomenclature Review Committee is still developing, I don’t think it’s possible for us to create a perfect data set that everyone will use. Instead, our role will be to coordinate the various groups that are focusing on these genes in multiple organisms. Essentially, we will serve as a clearinghouse for the data available and the resources accessible.

Based on a schema developed by the Germline Database Working Group of the AIRR Community, we can establish a process that assigns a permanent identifier to all germline sequences. This identifier would allow different groups to input their metadata and specify where they believe the sequence occurs on the locus, and even assign different names to the gene in some cases. Having a permanent identifier that all databases can reference would be a significant advantage to the field. Our vision for the IUIS and the societies we work with is to provide an avenue to disseminate information about germline databases and their applications in a fair and transparent manner.

And towards the future, do you see any big changes to how germlines are either being curated or how they are being used?

I believe that one of the main issues in dealing with complex regions in genetics is the need for caution. For example, when dealing with Variable (V) genes, there can be 40 to 50 functional V genes on a single chromosome in one individual. Attempting to assemble these genes in these complicated regions using only short reads (30-100 bp) is guaranteed to create problems. Additionally, primers designed for one gene can sometimes sequence other closely related genes or pseudo-genes, but the researcher may not be aware of this mis-priming. As for how the results of the validation process will be presented, I am unsure whether the IUIS will maintain a database or simply provide information about different databases and how researchers can use them. In terms of curation, we need to develop a clear plan for gene validation. I believe that having more strict validation and nomenclature policies, developed in collaboration with the broad immunogenetics community, will be one of the most significant changes at the level of our Sub-Committee.

Over the recent years, various companies have engaged in diverse analytical endeavors, particularly in biotechnical realms, such as enhancing off-the-shelf kits for immune profiling. This involvement not only supports the industry but also aligns with practical applications. Take organizations like the Antibody Society, for instance. How do you perceive collaborations between academia and industry in this context?

That is something I find very interesting. The Antibody Society is a trade association as it has been termed in the USA. It is a great organization, with hundreds of companies being part of it. All along, the AIRR Community has been very open to participation with industrial partners. In fact, a lot of the working groups have active industrial partners and even have industrial partners as co-leads in some cases. We welcome industrial partners with open arms! My impression is that these researchers in the industry are asking the exact same questions as their academic partners, such as: what are the expanded clones? What germline genes were used? What were the mutation patterns that produced the expressed protein?

The EU is promoting the Open Access data model, which is becoming mandatory in certain situations, such as when applying for research funding. Do you think this approach is effectively paving the way for a more open science?

The National Institutes of Health (NIH) in the USA recently implemented a new data sharing initiative and has required each grant to include a data sharing plan for the past 20 years. But it seems to me that there is often a lack of follow-up or enforcement; researchers either don’t put their data in public repositories, or they put the raw data in SRA or ENA, for instance, but often curated in such a way that it is very difficult for anyone to reuse the data. So, it takes more than rules to promote Open Access. There has to be a stick, enforcement, but more importantly, there has to be rewards for sharing that will establish a culture of sharing.

In my experience working with the AIRR Community and various industrial partners, it is not clear what the real barriers are to data sharing in academia or commercial enterprises.  Young researchers blame it on their PIs, PIs blame it on their institutions or the lawyers in the company that say they can’t share data.  I think it would be very informative to survey researchers, especially within companies, to understand these barriers to data sharing. How open is everyone to sharing data? For example, the iReceptor team has a collaboration with colleagues at Roche that involves a substantial COVID dataset that is now hosted in the AIRR Data Commons.

One mystery that intrigues me is identifying who are the primary producers of AIRR-seq data? My hypothesis is that companies are likely to generate a least ten times more data than academics. There is probably a wealth of unused data sitting in abandoned storage from projects that didn’t progress. Accessing this data could be immensely beneficial for the entire community. The typical excuse we encounter is, “I’d love to share my data, but the company’s lawyers won’t allow it.” My perspective is that once a project is abandoned, or even if it’s highly successful, sharing the data could lead to the company being part of a collaboration that could lead to new findings and even prominent publications, and perhaps new commercial avenues they hadn’t imagined. Additionally, the more a company puts its data out there, the more it becomes known as a good corporate partner. So, there is a huge possibility of collaboration in this regard, but navigating data sharing, accessibility, and awareness of resources is challenging.

One way to promote a culture of data sharing is to promote engagement with the community; patient communities, other researchers, and medical community.  Too often large, well-funded initiatives have little impact beyond their limited specific aims, even though there is potential for reuse of these data beyond this limited scope.  To enhance this type of community engagement, I propose a more intentional allocation of resources within grant structures to Community Managers, people whose job it would be to continually contact these communities who could benefit from the research being performed. Allocating funds towards this purpose would signify a commitment to fostering stronger connections with the community. In my efforts, I consistently advocate for directing attention and resources toward initiatives such as the AIRR Data Commons. Conceptualized as a curated art gallery for data, it provides an optimal platform for showcasing a researcher’s data in its most advantageous light. This approach not only increases visibility but also serves as an attraction point. It is crucial to recognize the value of such initiatives and incorporate them more prominently into our community outreach endeavors.

One way to make it advantageous to share data might be to implement a DOI for each database entry, such as an AIRR-seq repertoire. This would enable tracking of downloads and monitoring when the data undergoes processing with AIRR-compliant software. This would provide an immediate “dopamine rush” each time an individual’s data is examined.  This is opposed to the present system, where data reuse is only acknowledged when it is used in a publication, resulting in potential obscurity for an extended period, sometimes two or three years. This delay poses a particular challenge for individuals such as assistant professors or young professionals within a company who not only crave timely feedback but also need documentation of their impact for tenure committees or annual performance reviews. Overall, there is a need to intensify efforts to ensure more prompt recognition and rewards for the act of sharing data.

During the COVID period, the dynamics of data sharing were intriguing. Many individuals and organizations emerged out of nowhere and reached out to collaborate and integrate their data into our shared AIRR Data Commons, even at the preprint stage. This proved to be highly beneficial, particularly in streamlining collaboration with researchers during the preprint stage, where we could work with them to curate their annotations, rather than trying to replicate their methods and reconstruct their annotations. But the important question is, why did this sharing culture only apply to COVID research? Other critical areas, such as infectious diseases, autoimmune diseases, and cancer immunotherapy, all are equally in need of collaborative data sharing efforts. These areas, too, stand at the forefront, requiring data that can be compared rapidly, efficiently, and in a FAIR (findable, accessible, interoperable, and reusable) manner.  Hopefully, as a community, we can expand this culture of sharing that developed among COVID researchers to the broader immunology community.

Looking ahead, I think disease-specific consortia will be very important in promoting Open Access data sharing.  We are involved in a Type 1 Diabetes Consortium, led by Monica Westley of the The(sugar)science enterprise, which exemplifies the energy and commitment needed for successful data sharing, and that serious resources need to be dedicated to this part of the overall research enterprise. Community management is done by orchestrating regular meetings and fostering a collaborative spirit among participants. Disease-focused groups like this one demonstrate the immense power of sharing data and highlights the need for more initiatives that showcase the benefits of collaborative data efforts. Regarding the initial question, about Open Access initiatives, it seems that we went a bit off topic and transitioned into emphasizing the importance of collaborative efforts and disease-focused groups for effective data sharing in various domains.

As far as I understand industry and academia tackle similar questions and often obtain similar answers. It is indeed interesting that the COVID times really proved the benefits of data sharing.

Initially, we had around 10 to 15 COVID-19 datasets, several of which were curated in collaboration with industrial partners who played an active role in the research process. These partnerships have provided us with access to a wealth of valuable data.

When we refer to industrial partners, it’s worth noting that multiple commercial groups have informed us that they are implementing our iReceptor turnkey solution and storing the data in an AIRR-compliant manner. By adopting AIRR community standards, they find it significantly easier to integrate data from various public resources within the AIRR Data Commons and compare their proprietary data with public datasets, while maintaining their data behind their own firewall. It also facilitates internal data sharing within their companies.

As you may know, many larger companies face challenges with data sharing among different research groups due to variations in storage methods. This issue existed five years ago and still persists. By adopting the AIRR Community standards, industrial partners can better manage their data and take advantage of the benefits of this initiative.

In some cases, these partners may find promising leads, further develop them, and express their desire to publish their findings. When they want to share their data in the Data Commons, they can easily do so with just a button push, since it will already be formatted accordingly. We encourage industrial partners to engage with us. All of our software is open source, and the data is in the public domain. They can set up these resources behind their firewall, collaborate with the community initiative, and even contribute back to the AIRR Data Commons at some point.

Looking ahead, what exciting developments do you see coming in the repertoire field?

As I mentioned earlier, several groups in the AIRR Community are embarking on a new NIH-funded initiative to create a comprehensive knowledge base. This project is led by Dr. Lindsay Cowell of the University of Texas Southwest Medical Center.  The goal is to seamlessly integrate information on T-cell receptor and immunoglobulin germline polymorphisms, curated in OGRDB and VDJbase, through immune repertoires in the AIRR Data Commons, and linking these data to immune receptor binding characteristics in IEDB and IRAD. This endeavor has us quite busy and excited!

In addition, through the IUIS, Jamie Scott initiated the Big Data in Immunology Sub-Committee, within the Quality Assessment & Standardization (QAS) Committee. We are trying to initiate a process similar to the one behind the AIRR Data Commons, but at a level of multiple types of immunology data.  One of the strengths of the AIRR Data Commons is the standardization of metadata across diseases, institutions, and laboratories for receptors and associated metadata. Achieving this standardization is a straightforward process—simply input the same metadata in an intuitively comprehensive manner, which ultimately ensures interoperability. To accomplish this, the AIRR Community collaborates with different working groups. Together, they formulate protocols, which are then published in papers to expose these standardized procedures. The entire group reviews and votes on these papers, maintaining community initiative and control over the protocols. Expanding on these efforts, we aim to apply this approach to a broader spectrum of immunological data through the IUIS Big Data initiative. The goal is to establish common metadata standards, facilitating the integration, sharing, and interoperability of these data as much as possible.

Our goal is to streamline data sharing in immunology by creating shared metadata for a broader dataset. We’re not the only ones working on this, but we’re committed to making data sharing more accessible and straightforward in the field.

Another area of research that I think has been overlooked, but will finally come to fruition, is examining the effect of germline polymorphisms in TCR and immunoglobulin genes on disease phenotype.  This effort has been mostly hampered by our poor knowledge of the true variation among individuals and populations in the underlying germline polymorphisms. As we develop robust germline immunoglobulin and T-cell receptor data bases, we will be able to examine the effect of genetic variation in these genes on naïve repertoires, antigen-stimulated repertoires, and finally disease phenotypes – a complete genotype to phenotype description.  Such connections will lead to improved diagnostics and therapeutic interventions.

Another aspect that intrigues me is the wealth of information embedded in B cell and T cell repertoires, particularly in predicting reactions to novel cancer immunotherapies—discerning who might respond positively to therapies, or who might face adverse reactions. We’re in a bit of a Wild West phase concerning the signals within these AIRR-seq data, and that, I believe, holds immense promise. Understanding these signals could prove pivotal, and the prospect of unravelling them is genuinely exciting. However, to make significant strides, it’s imperative for researchers to openly share their data. It’s not about just observing a small cohort; it’s about pooling data, sharing findings in journals, and creating a foundation for further collaborative investigations. During previous interactions, we’ve observed confirmation that seemingly trivial polymorphisms can indeed have profound health effects. The intriguing prospect lies in scaling up these insights to a broader context, adding another layer of interest to the discussion.

Most certainly, another aspect is the analysis of population-specific polymorphisms. Given that the majority of available data stems from Northern European populations, the demand for insights into population-specific variations becomes paramount, especially in the context of vaccine development. Reflecting on our findings six years ago, a notable signal emerged while working with Corey Watson and Wayne Marasco. We discovered that broadly neutralizing anti-flu antibodies shared a common polymorphism in CDR CH2 of the immunoglobulin heavy chain IGH V1-69 [9]. The excitement surrounding this discovery was justified, as subsequent analysis revealed robust copy number variation and numerous alleles in this region. Notably, having the right allele correlated with a substantial increase in the production of highly effective anti-flu and broadly specific antibodies. This trend extended to different polymorphisms observed in African and non-African populations. The advent of these advanced tools now allows us to delve deeper into these intricacies, unveiling hidden aspects of genetic variation. As we navigate this terrain, the significance of population-specific considerations in vaccine development becomes increasingly apparent.

 

Biography
Dr. Felix Breden
Dr. Felix Breden

Dr. Felix Breden is a professor emeritus at Simon Fraser University. He played a central role in the establishment of the AIRR Community in 2015 and has previously served as Chair of the AIRR Community Executive Sub-committee. Dr. Breden is actively involved in the Common Repository Working Group and the Diagnostics Working Group, and co-chair of the IUIS TR and IG Nomenclature sub-committee. Additionally, he continues to serve as the Scientific Manager of iReceptor Plus, a European-Canadian collaborative project that brings together 19 partners from Europe, Canada, and the United States. The primary objective of iReceptor Plus is to create a data commons for the AIRR Community by developing tools and techniques to store, share, and analyze Adaptive Immune Receptor Repertoire sequencing data.

Henk-Jan van den Ham
Dr. Henk-Jan van den Ham

As a Research Team Lead at ENPICOM, Henk-Jan focuses on the latest developments in immunology and bioinformatics, particularly in the analysis of adaptive immune repertoires. He obtained his PhD in Theoretical Immunology & Bioinformatics at Utrecht University, and subsequently worked as postdoc and staff scientist at Erasmus MC, dept Viroscience in Rotterdam. In his current role, Henk-Jan is responsible for leading academic and industry projects, collaborations, and grant applications at the intersection of immunology, bioinformatics, and software engineering.

References
  1. https://www.antibodysociety.org/the-airr-community/
  2. http://ireceptor.irmacs.sfu.ca/
  3. https://docs.airr-community.org/en/stable/miairr/introduction_miairr.html
  4. https://iuis.org
  5. https://vdjserver.readthedocs.io/en/latest/#
  6. https://www.iedb.org
  7. https://vdjbase.org/
  8. https://ogrdb.airr-community.org/
  9. Scott J. K. and F. Breden. 2020. The AIRR Community as a model for FAIR stewardship of big immunology data. Current Opinion in Systems Biology. 24:71-77. https://doi.org/10.1016/j.coisb.2020.10.001.