A day in the life of: a metagenomics bioinformatician in a biotech start-up
I still remember being drawn to all things cell biology and microbiology as early as high school, and I wonder what I would have become if my biology teacher at the time had been less enthusiastic and inspiring.
Having completed an undergraduate degree in Microbiology and Immunology, I spent my post-graduate years in Infectious Diseases research. At that time, the laboratory was my comfort zone. I was only required to look at Sanger DNA sequencing data in a low-throughput manner during my PhD years. During that time, I feared computer coding and programming, and the prospect of a career path in bioinformatics and data science was most inconceivable. However, it was not until my postdoc years looking at high-throughput metagenomics data for microbiome analyses that the idea of me as a bioinformatician was slowly becoming apparent. Surprisingly, I have always wondered what alleviated my fear of working with data. Experience over time? Sure, but perhaps a big contributing factor is the realisation that bioinformatics, and technology in general, is well-situated to play a central position in helping understand the interconnection and interdependence between the microbial world, the environment, and humanity.
The above is what I think about these days frequently on my way to work. I am currently a metagenomics bioinformatician in the Data Science Team of Basecamp Research, a biotechnology start-up in central London that leverages artificial intelligence and machine learning technologies to change the world.
In essence, we focus on a diverse collection of proteins from microbes in natural environments that can act as starting points to help our collaborators in pharma, therapeutics, and food & nutrition to design optimal proteins for their needs.
Our wealth of metagenomics data from ethically sourced, Nagoya Protocol-compliant environmental samples collected globally has only begun to allow us to understand the power of our in-house data in providing solutions for better drugs, better food, and better diagnostics. In addition, because of our proprietary dataset, we are well-positioned to benefit from the power of AI technology, which otherwise suffers from a limitation of datasets for training purposes. At Basecamp, my role is to analyse our wealth of protein sequences and other in silico-predicted data from our metagenomes for the needs of current or potential customers. The nature of my work is such that each analysis is specific depending on the projects I am working on, which makes my day-to-day work exciting.
A typical day starts with daily meetings with my teammates from the Data Science Team (which includes some amazing data engineers, deep learning scientists, and bioinformaticians) to discuss our daily progress. Our aim is to optimise and automate the way we process, analyse, and store metagenomics and protein data at scale as each of us play a crucial role in piecing the technology together.
We also talk about how to better understand relationships between environmental metadata and protein function using the world’s most comprehensive knowledge graph. My contribution in these meetings is to provide valuable insight on our knowledge graph and other pipelines which are applied to help solve customer needs. It is important for the Data Science Team to know how our pipelines are providing solutions and solving problems, as we are constantly trying to develop better ways to align our capabilities with the needs of our customers.
I also work closely with our Commercial Team, where they update me regarding conversations they have with existing and potential customers and collaborators. I will often give a rundown of how the analyses were performed, making sense of the analytical steps taken, depending on what a customer is looking for. These meetings allow me to better understand the effects my work has on providing solutions for a customer in a tangible and direct way.
I also communicate with our wonderful teammates from the Field Science Team, who are responsible for collecting metagenomics samples and the associated metadata, as well as help extract the DNA from the samples for sequencing. Given the wealth of metadata available for all our samples, we can predict whether proteins of particular functions (such as those desired by a customer) are enriched in a set of environmental attributes (such as pH, temperature, salinity, etc). Having received that information, the field scientists can focus on collecting samples in their subsequent field expeditions that are most likely to contain proteins important for our customers and actually steer our protein design processes in real-time.
Of course, data analysis takes up much of my typical day at work. I delve into our wealth of DNA and protein sequence data and use different proprietary pipelines to help compile a shortlist of proteins that we think could address our customers’ requests. This is usually accompanied with in-silico structural and ‘ligand-interaction validation’ analyses of our shortlisted proteins. I then can see how they compare against that of a reference protein that carries out the desired enzymatic function. Having completed the first set of exploratory/preliminary analyses, I prepare visualizations to help the Commercial Team present the findings. Of course, clear documentation of the analytical steps is important; we are tasked with different projects every day, and it is difficult to remember the specific steps taken. Therefore, once a set of analyses have been completed for a given project, I would also spend time documenting the work I have done after each day, which is invaluable. This would include describing the background information, how and why a particular analysis was performed, and accompanying visualizations.
I am grateful to be a part of this amazing, inspiring team of supporting and friendly individuals at Basecamp Research. As a long-time academic who has only recently transitioned to what once seemed like the daunting world of industry, the palpable significance of my work in developing long-lasting and trusting relationships with customers and collaborators is inspiring. More importantly, we are constantly learning from each other, and this motivates me to push my limits and potentials continuously.