This is reposted from a Q&A we did in October 2023. Check out our Slack for more.
We had the chance to chat with Founder, Kevin Flyangolts, from Aclid! Aclid is a security and compliance automation platform for gene and oligo synthesis.
Read more to learn about:
Safety, compliance, and biosecurity in the biotech
Bioinformatics and LLM solutions for biosecurity
How to improve global biosecurity
and more!
Interview by Nicholas Larus-Stone, founder of Bits In Bio
Question #1:
Nicholas: Welcome @kevin! super excited to chat about aclid and biosecurity. i’ll get us started with a few questions and hopefully the community will jump in as well!
let’s start with a bit of background. how did you get interested in biosecurity?
Kevin: I was the first engineering hire at an early stage startup that got acquired in 2019. i transitioned to a product role for 6 months and then decided i wanted to start something of my own. i went through a few different iterations exploring biotech. first was a cell line development data platform (kind of like benchsci), then tried cell-free synthesis as a service (kind of like tierra biosciences), and eventually landed on biosecurity when i started working with professor harris wang
Kevin: It started as an idea that security was going to be important as synbio became more accessible, and with some digging, i found our first use case and major problem in helping dna synthesis providers with screening
Question #2:
Nicholas: Tell us about aclid — what do you do and why is it important
Kevin: Aclid is a security and compliance automation platform for gene and oligo synthesis. we help providers to screen orders, verify customers, and perform compliance checks
Kevin: For gene synthesis providers this can become an expensive problem especially as you consider the costs of dna dropping fast. the less orders they have to review manually and the more that we can automate verifications and legitimate, the faster and cheaper they can deliver genes
Kevin: For the synbio community as a whole, i believe the only way we get great products is by building trust and security upfront. there's a lot we can do with the tools we have today. we're making sure they're used responsibly
Question #3:
Nicholas: Biosecurity is a scary term, what do you think is hype in the popular imagination and what do you think is realistic threats that are underappreciated?
Kevin: I think biosecurity is a big term. sometimes it refers to the security of biotech ip, sometimes it's referring to protecting the environment, health, and animals, and sometimes it's about making sure dual-use technology isn't used for harm (accidental or nefarious)
Kevin: We're squarely focused on the the latter two, protecting the environment, health, and animals, and making sure biotech is used responsibly
Kevin: Despite how easy it is to get access to dna from harmful organisms, it's still pretty hard to actually make a virus or a bioweapon. but it's getting a lot easier. i think the urgency to build now is right. in 3-5 years, we want to have some infrastructure in place that would make today's somewhat exaggerated threats much harder when they're more of a reality
Kevin: A lot of the attention in biosecurity is placed on world-ending pandemics but i think there are bad things you can do that don't go so far. i can imagine a world where actors that don't like some industry use a biologically enabled technology to disrupt their economy (e.g., a competitive farmer that ruins their neighbors crop yield one year)
Question #4:
Nicholas: How much of biosecurity can be accomplished in a purely computational platform vs regulatory or scientific controls?
Kevin: I think the first line of defense, namely detection and prevention are mostly computational tasks
Kevin: You can both detect and prevent the synthesis or spread of a pathogen with the right computational tools assuming regulatory and scientific control incentivizes it. we can't computationally stop someone from synthesizing something but we can create the tools to measure it that inform our regulatory and scientific controls
Question #5:
Nicholas: You mentioned compliance — how much of this is driven by government regulation vs companies stepping up and deciding this is important?
Kevin: It might be counterintuitive, but the industry has as much incentive to regulate as government. any incident could heavily disrupt the bioeconomy. if a pandemic were to be caused by a synthesis company, it might mean that we much more heavily regulate synthetic biology and impede a lot of research happening today. that might mean the billion dollar companies that exist today are no longer quite as big and the ambitions for synthetic biology are forgotten
Kevin: To have effective regulation, you probably need both parties in the room
Question #6:
Nicholas: Since this is bits in bio, can you elaborate on some of the computational work you do? i imagine there’s some known pathogen databases that you can screen against, but what other types of tooling do you provide?
Kevin: We build infrastructure around bioinformatics that enables you to get compliance information via an api
Kevin: This includes:
• traditional sequence alignment (e.g., building sequence databases, using bioinformatics sequence similarity, structure similarity, and alignment tools)
• traditional data processing tools (e.g., pandas, dask, polars) to determine which compliance or regulatory is important
• a bunch of traditional software infrastructure to get this to work in real-time (e.g., kubernetes)
Kevin: And then there's the frontend and integrations we build into crms and erps to better work with our customer's existing systems
Nicholas: You mention structure similarity — there’s been a bunch of recent advances (e.g. foldseek) and the release of the alphafolddb/esmatlas. how much time do you spend thinking about new ways to use these tools vs using traditional bioinformatics
Kevin: A lot. traditional bioinformatics is slow. one of the great things about machine learning and particularly llms is that it takes a problem that needs a ton of data and compresses it. if we can confidently move away from relying on traditional bioinformatics, it means faster results, less time spent curating data, and less errors from incorrect annotations or undeterministic alignment tools
Question #7:
Nicholas: Relatedly — is the compliance or the security the harder problem here? is most of your code/complexity around the compliance workflows or verifying that a sequence is safe
Kevin: A lot of the work is in automating the compliance at the moment. things like the us select agent program, eu dual-use export controls, and us commerce control, are very explicit about what they allow and don't allow
Kevin: There's a list of about 50 organisms that are controlled across all these lists. the challenge is in determining whether a sequence derives from one of these organisms and whether the gene encoded is pathogenic. determining origin is still not solved, but not too difficult. determining pathogenicity is still very hard but we have a lot of tools to do it and can err on the side of caution when we're not sure. the problem is all the follow up and verification you have to do when you determine pathogenicity or when you're not sure. we're making that a lot faster and easier
Kevin: There's a push to move beyond a list of organisms and instead rely on harm (is a sequence pathogenic or toxic). that would shift the complexity more toward security
Question #8:
Nicholas: Do you have multiple definitions of “safe”? could something be safe for use in one organization but not in another?
Kevin: Yes. as a simple example, a company making proteins from bacillus anthracis using in vitro synthesis and assembly is less dangerous than a company doing so in vivo. if you're expressing anthrax toxins in vivo, there's a lot more consequences than just having the dna for an anthrax toxin. although both are questionable overall
Nicholas: Sounds like there’s a lot of context that’s important here. is this something you do yourself or expect the synthesis provider to provide this?
Kevin: Every company has a different process and different things they care about. there are guidelines and frameworks that we can modularly include to help them automate additional steps. so, if you're using yeast to assemble dna, we can add additional rules to our data processing so you are alerted to certain sequences that other providers might allow
Kevin: There's also the question of use. some genes are only dangerous in certain contexts (e.g., in a specific chassis). that's much harder for us to solve for right now
Question #9:
Nicholas: One of my favorite topics: how can we use llms to accelerate science! you mentioned `one of the great things about machine learning and particularly llms is that it takes a problem that needs a ton of data and compresses it.` — can you elaborate a bit more on how you are using llms?
Kevin: We're mostly using the data coming out of them. we're looking at ways to incorporate novel protein designs and protein structure predictions into our dataset to expand the types of sequences we'll catch
Kevin: In this case, the compression isn't quite as good as what you'd typically get, but being able to only look at proteins for instance instead of dna can reduce the computational complexity a lot. lots of different dna sequences can have the same structure. if we're able to use a single structure to represent them all, that's a win
Nicholas: Ah when you say llms are you talking about protein language models?
Kevin: Yes
Nicholas: Have you looked into using general llms to automate compliance workflows at all?
Kevin: Not too much. there might be ways that it would reduce our code complexity, but not knowing how it works would be a problem for compliance. we want to make sure we know exactly why something is in violation of a control and when
Kevin: We've used llms for building our codebase (e.g., copilot)
Question #10:
Nicholas: You mentioned you integrate into erps and crms — would you ever integrate into a lims?
Kevin: Absolutely! most of our customers use crms like hubspot and salesforce and erps like netsuite to track orders from intake to manufacturing
Kevin: I'm sure there are lims used somewhere in that workflow, but we haven't been asked to integrate there yet. i think providing additional layers into tools like benchling could be a really cool way to build security and safety into the earliest parts of the workflow
Question #11:
Nicholas: Zooming out a bit, have you seen a rise in the number of synbio companies/gene synthesis providers over the past few years?
Kevin: 100%
Kevin: This is more anecdotal but from attending synbiobeta over the last couple of years, we've see a lot more providers that just got started last year than the year before
Kevin: And now, there are companies building more complex products on top. as dna got easier to make, more companies provide viral vectors as a service, proteins as a service, etc.
Question #12:
Nicholas: If you could wave a magic wand and change one thing about the way people did biology to improve global biosecurity, what would it be?
Kevin: :thinking_face:
Kevin: One thing that would make things a lot easier is standardization. there are some already (e.g., ontology, taxonomy id, etc.) but the way we annotate data in databases is not uniform and makes it very difficult to determine pathogenicity
Kevin: As a simple example, there are handfuls of egfp proteins that have their taxonomy id marked as a controlled virus (e.g., ebola or chikungunya). this can lead to false-positives. on the flip side, there are plasmids or wgs that are labelled as coming from some harmless prokaryote but have contaminants from a controlled virus or a gene that's actually pathogenic
Nicholas: Thanks so much @kevin for the in-depth, insightful answers!! biosecurity is a really important topic and it’s awesome to learn more about aclid.
if people have more questions, feel free to add them to the channel and kevin may answer if he has time
Kevin: Thanks so much for having me! happy to answer any questions now or feel free to ping me any time. love talking about this and always learning from this community