Q&A with BigHat Biosciences

AMA style interview with Eddie Abrams

Feb 14, 2023

We had the chance to interview Eddie Abrams, VP of Engineering, at BigHat Biosciences, where we discuss differences between tech vs. biotech, software build processes, and much more. The interview can be found below!

Nicholas: Super excited to kick of our next q&a with @eddie right now! as always, i’ll get us started with some questions, but please join in with additional questions of your own

Question 1:

Nicholas: You have a lot of experience in the tech world. how did you get interested in applying your engineering skills to biotech?

Eddie: Hi nicholas! thanks for having me -- this is pretty exciting! so that was actually completely by accident. i started off actually as a phd graduate in philosophy so my entire tech background was all an accident too, self-taught and done just because i was jazzed about it. mark depristo, the ceo of bighat, actually found me during a search for synapdx and that was my first healthcare and bio job, where i learned all the basics of genomics and running big pipelines on aws! i had a little science background from again my education, and so i just dove into the stats/science stuff. the basic core engineering work is actually surprisingly very applicable in general to the problem space, i have found, especially if you have strong full-stack experience or analytics background.

So the shift into this space was because it was the most exciting job offer i had seen, and from there it was just diving into the learning process and really getting great mentorship from top notch stats and bio people.

Question 2:

Nicholas: What surprised you most when transitioning from tech to biotech?

Eddie: The biggest surprises i have found all stem from the fact that in biotech, very often software is not the product. it's a molecule. or a prediction. or an observation. and so the dynamic of what a software organization is, what it's for, and how it operates is a really interestingly different from the case of software for software's sake, or directly for customers.

Anonymous: What do you think are the biggest repercussions of these differences?

Eddie: To my mind, the biggest cultural touchpoint of engineering in a non-software-oriented organization is that engineering is an enabler, a service organization, whose customers are our very coworkers. so i stress in every engineering deck, and in every team discussion, that our role is to help people get things done. this produces a very different way of thinking than you sometimes find in pure software organizations. it's a different kind of pragmatism. a different way of weighing technical debt, which is still very real and critical to manage correctly, against immediate, highly cost sensitive needs of say a running laboratory. so it means our hiring is very, very important. we need engineers who want to help people. we need engineers who think learning is rad and just want to absorb every conversation with a scientist. and it means that our successes are visibly and deliberately championed because we don't have kpis/okrs/key goals at the corporate roadmap level, so it's up to leadership to make sure that engineering efforts to those very visible goals are made clear. it's quite refreshing, in my view.

Anonymous: Completely agree! very refreshing to hear

Eddie: Thank you :smile:

Question 3:

Nicholas: Tell us more about bighat bio – what are you working on there and where does software come into play?

Eddie: So our mission is to use machine learning/ai and synthetic biology to design better protein based therapeutics and improve human life. mark and peyton started with a vision of how software would be deeply integrated in every aspect of how the platform we are creating operates -- so rather than software being a silo, the processes, techniques and applications of what we do integrate very well with the entire design. if you consider how for example automation plays a role in an ai design loop, it becomes clearer. you want to as much as possible automate all the steps between designing a molecule, testing its properties, running ai on the results and producing a new design to go through that cycle again.

Eddie: And given that it's ai driven, and you want n to be as high as possible, you want that loop to be fast, efficient, tight, reliable and produce high quality, visible results with every turn.

So what i am most focused on generally is creating infrastructure, frameworks and data management tools that bring cloud native design, event driven architecture and highly custom, bespoke tooling directly into the hands of the lab, the data scientists, the ai/ml engineers and even the business side of the company, so that they can operate knowing they're up to date with all of our most recent best practices.

Eddie: It's pretty rad :slightly_smiling_face:

Rani: Would love to know more about your mention of frameworks here! can you expand? are you creating frameworks in bio or software (or both)? are these programming-esque frameworks or conceptual frameworks?

Eddie: Great questions. so both, for sure. we have both strictly software based frameworks for how the code works and how we mentor-through-code others to expand what we have. so for example, "notifications" based on data events is not just a service we provide, but an entire library and infrastructure that lets people grow how we actually make notifications go. and on the biology side, we bake in knowledge of the biologics -- sequence information, metadata, reverse translations, codonizations, translation environments and organisms, etc -- in order to place the biology in the "business layer" at the core of how the entire system works. i hope that helps!

Question 4:

Nicholas: How do you think about bio-specific vs industry agnostic pieces of software?

Eddie: Actually i find less of a distinction here than i thought i would. i think the usual impression is that the more industry agnostic a piece of software is -- say, a cicd system -- the more generalizable it is. but i no longer believe this. in my view, if you design your company around the idea that engineering is central, there will be both biological and general tools that you'll want to integrate, but these will be the exception and not the rule. for the most part, you'll be implementing your own solutions from a fairly minimal set of tools, outside of cloud infrastructure, which i think is universally helpful.

Question 5:

Nicholas: What’s your recruiting pitch to software engineers?

Eddie: Ah, yes! i think of getting a job as having three major pillars: the mission, the technical stuff, and the offer itself. so with bighat, the mission and the technical stuff really "write itself": it's hard to compete with the mission of making human life better, and, the technical problems are just a delight to any software developer, as they include scale, reliability, deep knowledge of jobs and work graphs, data management at scale, and so much more. and as a bonus, you get to work with industry experts across lab, science, ds and ml fields that you just don't get a chance to in every job you'll look at. then we just make the offers as strong as we possibly can! it seems to work!

Question 6:

Yohann: What do you think about the off the shelf software for your work? how much do you need to create custom software or data stack?

Eddie: Hi! yes, i'm tackling this one in the same breath as buy vs build. so i've come to the view that you buy the undifferentiated heavy lift and you build the rest. in particular, the "undifferentiated heavy lift" turns out to be highly abstract work graph (step functions) features, queue (sqs) features, storage (s3, lustre) features and compute (lambda, ecs, kubernetes, ec2) features. from there, you face a fork in the road. if you don't have your own fully staffed engineering department, you can buy great services from benchling and others. great for teams that are really looking for low engineering in house. grid.ai too. weights and biases, etc. but if you have engineering integrated, as bighat does have and needs, then you end up in my view really building out lots of this stuff yourself, for the simple reason that the overlap between your actual needs and what's already out there is low. like: microsoft word, i only use 1/1000th of the features. it's ok because it's ubiquitous, but it's certainly not purpose built for the notes i'm taking right now. it's way overkill. and then underkill for what i really need later.

Eddie: I hope that helps, i can say more about this, but i don't want to ignore other questions too!

Yohann: Thank you. we all face similar questions in the industry :+1:

Question 7:

Anonymous: How do you balance enabling data scientists moving quickly and experimentation and building scalable production software?

Eddie: This is an awesome question we addressed at bighat from day 1. my thought was precisely that the developer, the ds and the ai/ml experience should be no different locally from on production. if you pause there for a moment, it seems at once obvious but also a bit daunting. how do you achieve that? what we did was built out all of our apis, libraries, cli tools, and web interfaces so that the experience of any user using our system in any way was identical. if it was running locally, it ran just like e.g. pytorch-lightning. if it was on sagemaker, you just tweaked the very same command and it would run in our fully managed environment using exactly the same code and exactly the same parameters. how? extensive work in cicd, on demand container builds, highly flexible and dynamic bootstrap system and finally great git, cli and distributables managment on the cloud side! again, happy to say more if you're curious!

Anonymous: Wow that's impressive. doesn't that mean tho any new systems take a long time to setup? do they get impatient? i guess that's be benefit of doing it right the first time!

Eddie: The spirit behind it, because of the framework aspect mentioned in a nearby thread, is actually to remove engineering as the road block and enable the other teams to do some of that themselves. so for example, we unlock ds and ai/ml to build out their systems using the skeleton of what we've created, and then we continue to develop out tools to make that easier and easier over time. i don't want to paper over the challenges, though -- sometimes, yes, it takes a while, and we're constantly balancing priorities, money and time to make the equation fit what we need :slightly_smiling_face: so it's not easy, but you are actually hitting it on the head when you mention that doing it right from the start really helps: our soc2 journey, for example, has been trivial -- one person spending a few off hours over several weeks. there are some places where having the right engineers, or the right leaders, will pay off in spades.

Question 8:

Jonathan: What is the bighat bio philosophy on open-source software development?

Eddie: We have an internal open source model and we contribute directly to open source projects. my own hope is that at some point we'll be able to open source more of our own stuff, but, it is really quite bespoke and purpose fit to what we do inside the company. so it really is likely for a long time to be that our major contributions will be to the projects that we use internally, such as superset, zappa, react++ and others.

Question 9:

Nicholas: How do you manage cross-functional teams that have very different backgrounds (swes, ml, bix, cix, wet lab, etc.)?

Eddie: Well that's an interesting question. we have a very all-in matrix style management approach. what i mean is that for the moment, engineers mentorship/career/deep conversation report into me, but, anyone at bighat can and does drive big cross team projects. so this answer then gets quite similar to something i said in another thread: if you want to effectively work with lots of people with fantastic, diverse talents, regardless of whether you're "managing" or icing or whatever else, i would say learn to listen, keep a notebook handy where you jot down everything someone says that you don't understand, live with wikipedia, ask for mentorship, ask for time and help, and just learn everything you can. if you told me three years ago i would be competent at designing antibodies from ecoli linear template backbones, i would have said: what? but here we are.

Question 10:

Nicholas: R or python :p

Nicholas: Or go??

Jonathan: Or scheme :wink:

Anonymous: Glad someone did it i've been waiting!

Eddie: Hah! i have a passion for python, personally, and it's handy for interoperating with most of the systems i use (google, aws, etc). r allows the same flexibility in the limit, but my perception is that its zone is narrower. our ds teams uses r, and eng supports that for analysis purposes. but it's difficult to use it as broadly as python, especially in a serverless environment such as ours. not impossible, by any means, just more challenging. go... i have a love hate relationship with go. i was using it actively about 4 years ago when it was still going through massive package management growing pains. so i'm officially agnostic about language... though of course python is the best :slightly_smiling_face:

S: Each language is a tool, some tools work better than others. using a sledge hammer will take down a tree but an axe is better. an axe might take down a wall but a sledge hammer is better. python is a swiss-army knife great at a lot of things, r is for statics modeling and it very good ad it, octave (matlab/maple) is good for numerical experiments and computation. if i want the answer now, i’ll use python, if a client wants visual statistics i’ll use r. if there is a multi-part question solutions, i’ll use octave. if i need to generate a finite element model with billions of points i’ll use c++. most could be use to get the majority of the desired answers. sometimes a client requests a special format. a programmer isn’t confined by the software.

Question 11:

Matthew: What’s the scale of data you’re able to generate right now?

Eddie: The very interesting premise of bighat is that though we have big hats, we have relatively small data.

Eddie: So we do typically process data on the order of a few gigs and we're growing that as we grow our internal repertoire of projects, feature engineerings, etc. but the bigness there is actually in how much we process it, not in how much data there is in comparison to say an analytics or logging or splunk or google workflow.

Because we're doing iterative machine learning, we learn a lot about molecule design from our failures.

So in short, it won't sound impressive from a bytes point of view. it's the compute and how we apply it, aggregate it, automate it and project quality metrics onto it that are the interesting and intensive story for us.

Matthew: Gotcha. is there a large effort/research team working on lower-n algorithms? would you say that’s the key innovation here, or is it the putting it all together that makes bighat big

Eddie: Peyton is the expert here, but what i know is this: new techniques such as active learning trade in part on the fact that for some learning problems, you can get a lot of learning from a single reasonably accurate starting model and then hone it with surprisingly little data with each iteration. so we focus a lot on this problem: getting our designs accurately evaluated quickly and getting that data into the next prediction.

Question 12:

Isaac: How well your product is resolving the protein design/optimization problem? at this moment, what technical challenges have the most impact and necessity to be resolved?

Eddie: Hi! that has the initial look of a deep biological question, and if there's an angle on it that i might be able to tackle as an engineer, if you could clarify that part, i'd be happy to take a stab! in general, everything we do is targeting multi-objective optimization of antibodies (proteins), but i am certainly not the biological expert on the details there.

We have recently made public our results at optimizing a partner's antibody, which is tremendously exciting to us.

The technical challenges we face as an engineering team fall into two big buckets: onboarding new experiments/assays and managing the enormous (collectively) data we have.

The smallness of the data *elements* should not be interpreted as the smallness of the data in *aggregate* :slightly_smiling_face:

So we are deep in the process of making everything related to data -- the ds pipelines, the tracking of data, auditing, etc -- far, far easier and more accessible. and we're constantly onboarding new instruments, assays, experiment types, and this is all managed in our (soon to be trademarked!) custom software solution. so we focus a ton of attention on making these aspects of the system better.

The tighter we close the design loop, the faster the robots can get on with their business of searching protein space for discontinuously better antibodies :slightly_smiling_face:

Isaac: Thanks, @eddie. i very interested understand if companies working in fields like genome mining, protein design and optimization are actually getting results that other companies are interested in buying.

Isaac: Could you share the public results? will be very good to take a look.

Eddie: bighatbio.com -- that is our site, news feed, link to our linkedin publications and posts, etc!

Question 13:

Jonathan: What are the pros and cons of being vp of eng in a company started by cs / ml folks?

Eddie: One of the big pros is that it means i get to learn a ton of new things. and related to that, it means that our company is in this mode of *integrating engineering into some other major goal.* that is, it's not engineering for the sake of the software. we're not selling the software. we're not marketing the software. this is very liberating, but it's also getting to be a core way to be a software engineer in a company with a competitive moat. i would say that it's more difficult these days having a pure software product that can be wildly successful and continue to have a very defensible moat -- a big advantage over someone else who wanted to enter the ring and compete for your customers. not impossible, by any means, but harder. so i've come to appreciate the pro of being a part of something that has such clear differentiation from competitors. and the way it looks to me right now, not everyone is integrating software into their non-software-oriented company the way i think will be normal in the future. so you can be ahead of the curve, here. and as i mentioned in another thread, it's appropriately humbling to be a service organization -- we win when our customers win. and i like that spirit very much and i want to grow a big team who all champions this as i think it makes for a fun time and a powerful, effective organization. so i think of those as big pros. cons? it's always more difficult as an engineer working with people who might not intrinsically understand what you're up to, the trade offs you have to deal with. who work in very different toolsets than you do and who have everything but software on their mind from day to day. it can be frustrating, it demands a lot of patience, and it means you have to learn to be incredibly clear and on point in your messaging to make sure there's no confusion, and to maximize impact. i don't think it's a big con, but it's certainly worth noting as not every software engineer thrives in that environment.

Question 14:

Jonathan: I have a strong suspicion that your phd in philosophy is correlated (though perhaps non-causally) with your success as an engineer operating in complex organizations. do you buy it?

Eddie: It's an interesting question! i buy it! i think they have common causes :slightly_smiling_face:

It's kind of funny. i think that academia hones certain aspects of character, and i also think that being self-taught gives you a very interesting way of looking at engineering. (that is, not going to school for engineering.) i've often attributed these to my overall attitude towards things that really launched me into both academia and engineering. it's a great question, and clearly could go out of bounds of any particular time window!

Jonathan: Yeah. i co-founded a software team having never taken a formal course in computer science, though i did go take cs 61 at harvard after a year, defusing assembly bombs and such. also rebuilt mit's intro stats having never taken a course in applied math. it's a bit cart before the horse, but has its advantages too.

Eddie: Awesome stuff -- i think it gives you some crazy quick perspective on things to do it that way. i think knowing that you'll be in a situation where your knowledge will be gappy can give you a lot of practice in thoughtfulness and reasoning from first principles. i feel like i do that a lot!

Jonathan: Yup! though your philosophy was my theoretical math, same idea.

Eddie: Exactly! there may be some aspects of the abstract thinking aspect here that hooks up with the result, but i suspect it's less important than approaching two very different challenges with a lifelong learning mindset.

Question 15:

Nicholas: Just want to give a big thank you to @eddie for the q&a! great to hear his perspective and more about what's happening at big hat! non-us friends, feel free to keep asking questions when you wake up

Eddie: Thanks all for the really fun q and a!

Thank you for reading Bits in Bio. This post is public so feel free to share it.