Title and affiliation
Scientist at Arcadia Science
Years of bioinformatics experience
Fields of research
How do you use bioinformatics in your research or profession?
My entire scientific identity is founded on bioinformatics and computational biology – I spend almost all of my research time on computers. I write new methods to analyze sequencing data. I combine existing methods (often using workflow automation software like snakemake or nextflow) to answer new questions, often using tools in new ways or taking advantage of data that other researchers would discard (what’s in the unassembled reads? Which reads didn’t map to the reference?). I write and deliver scientific computing trainings to help other scientists level up their skills. All of my training materials are openly available online and are accessible so others can dive in on their own time.
What attracted you to pursuing a career in bioinformatics?
I enjoyed the different scales of problem solving. When I’m programming, I love solving little problems that crop up when you’re in the weeds of a project. I love how over time, the code you write and the decisions you make while doing so solve a bigger biological problem or provide clues about an answer to a biological question.
How does bioinformatics serve as a unifying element across different scientific fields?
There is a lot of overlap in the data structures or code that can be modularly employed across diverse computing inquiries. Think for example of count matrices – they can represent species counts in different ecosystems, gene counts in a single cell or a specific tissue, metabolite abundances in a sample…the list goes on.The different fields that might create this type of data vary wildly, but some of the lessons we learn in one field about the best way to analyze or store that data can be shared across disciplines. A more tangible example is the nf-core community and its modules repository on GitHub. Nextflow has reshaped its workflow automation software so that modular tasks undertaken by individual bioinformatics commands (e.g. sourmash gather) have a general incantation and can be reused across different pipelines. I think that’s a really beautiful ecosystem of shared code.
In what ways does interdisciplinary collaboration in bioinformatics contribute to advancements in specific fields?
Bioinformatics is a massive field. A graph theorist, data scientist, computer scientist, or statistician might identify as a bioinformatician. Collaboration between bioinformaticians helps pattern match between difficult biological problems and the combinations of the best approaches to solutions. On the flip side, collaboration between bioinformatics and non-bioinformaticians often unites domain specific knowledge with data analysis at scale. I think that the scientist who creates data will be best positioned to be making nuanced decisions about the analysis of that data, so collaboration is one avenue to account for that expertise during analysis.
How can we improve collaboration between disciplines where bioinformatics is applied?
Computational literacy and FAIR data. Computational literacy doesn’t mean that everyone needs to know how to program, but even building skills in spreadsheet organization and data management practices can lower the lift between data sharing across fields. Similarly, open access data that has well-documented metadata can go a long way to fostering collaboration. I think if the data are available, intrepid scientists from different domains are more likely to engage with and explore that data. This can act as a starter for conversations between the people who generated the data and people who pick it up to analyze it.
What excites you most about bioinformatics?
I really love thinking creatively about a new problem. What angle hasn’t been considered before for a problem? How can we tweak existing methods so that we can scale them to all the data? What data are other people throwing away and what secrets hide in there?
What advice would you give to someone interested in pursuing bioinformatics?
From every angle, there are a million tools or approaches that can be used for any given task in bioinformatics. Approaching this decision space can be daunting. When I’m trying to choose a new tool to learn (think programming language or workflow automation software), I default to the one that others around me are using so that I have a friend to ask questions as I encounter problems. When I’m trying to design a method for a data analysis project, I consult the literature or GitHub to see how others have approached the problem. When this doesn’t give me full clarity, I sit down and carefully think through my reasoning for why I think my approach is okay, and carefully document my thinking and the code I actually use to do the analysis. That way, it’s clear to others why I made the decisions I did, but also I can revisit my approach if I receive feedback from someone with more experience around whatever problem I’m working on.
How do you think bioinformatics could become more accessible in public discourse?
I would love it if everyone had the chance to learn how to program in K12. I think that programming skills can serve anyone, regardless of what career they pursue or how they choose to spend their time. If everyone knew some basics of programming, the lift to explaining bioinformatics would be lower. As a shorter term solution, I think bioinformaticians could learn a lot about communication from physicians or media experts who routinely have to communicate complex topics to diverse audiences.
What is your current favorite bioinformatics program or software?
Favorite might not be the right epithet, but while I have loved the workflow automation tool snakemake since 2017, I recently started writing workflows in Nextflow because the management platform Tower gives easy access to inexpensive cloud compute. It’s been fun to compare the pros and cons of each software.
What is your favorite snack?
Corn tortilla chips :)