(This Q&A is from February 2022 and has been lightly edited)
We had the chance to interview Timothy Stiles, creator of the open-source Go package, poly. Below we discuss the advances of using poly, open-source software in biotech, and much more!
Nicholas: Welcome to our second q&a! thanks to @timothy, the creator of poly, for agreeing to do this. I’ll kick it off with a few questions, but others should please jump in!
Question 1:
Nicholas: How did you get into building software? what about synthetic biology?
Timothy: Taught myself to code out of high school via harvard cs50 and a bunch of torrented textbooks. The path to synthetic biology is long, winding, and super non-linear but essentially I was a mech e major - design major - cs major - huge computer vision nerd - nsf paid me to work on bio stuff
Question 2:
Nicholas: Tell us more about poly!
what inspired its creation?
what are comparable existing tools?
Why did you choose to build poly in go?
do you think the sw/bio field should move beyond python + r?
Timothy: Well I originally started writing it for myself and Keoni Gandall. He was working on freegenes and needed to convert some json to genbank and biopython kept crashing
pretty sure every genbank freegenes ships now has a line it saying it was generated with poly :joy:
Anonymous: As someone who is pretty ignorant about syn bio - what are the general important computational problems that poly helps solve?
Anonymous: Is there anyway for novice/beginner syn bio and go programmers to start contributing to the codebase?
Timothy: Biggest ones at the moment that no one else does well are probably circular sequence hashing, codon optimization, dna synthesis checking, golden gate cloning, and dna barcoding? there’s a lot in there.
Anonymous: Haha at the risk of being exposed as a total imposter can you expand a bit on what these things are?
Timothy: Oh man I’d need a whole course to explain all this. should I start a crowdfunding campaign? our docs and comments are pretty good already and we always try to explain why we wrote a thing and what it solves from a biological context.
Juan-carlos: Are there specific bioinformatics workflows that poly is well-suited to be integrated into? (I’m also a biotech newbie) - I’m trying to contextualize the value prop of these tools. Also targeting specific workflows might help with poly adoption.
Timothy: Synbio and molecular biology focused tasks. We’re the best at it.
Wrote a demo to generate primers to clone out every coding region of a genbank genome for a talk the other month that took all of 5 minutes to write a function to generate and check thousands of primers.
Timothy: Should say no one else does well in the open source world. can’t really know what closed source peeps are up to.
Timothy: I simply couldn’t find any synbio software that could run on my computer! closest was biopython and edinburgh genome foundry and I know the devs behind both a little and they’re excited about poly.
Question 3:
Kevin: Has poly been integrated into any hardware assembly yet? how should one think about that integration?
Timothy: I’m not quite sure I understand the question but go has an alternative third-party compiler for embedded systems called tinygo. I don’t like messing with it since it relies on cgo which makes it hard to stably deploy stuff. cgo is my least favorite thing about go when it comes to the language’s implementation.
There’s also support to compile to wasm. I’d say that poly can run almost anywhere but since go wasn’t really meant for embedded stuff or wasm stuff you’ll probably have to wrestle with it in the same way you’d have to wrestle with micropython.
If I wasn’t already familiar with go I might lean towards using zig or rust for embedded stuff but I’d have to really look into the tradeoffs before deciding.
Timothy: You should 100% be able to build poly for a microcontroller but with its current features I’m not sure if there’s any good use cases yet.
Anonymous: Who do you expect the end users of poly to be? developers of bioinformatics workflows, or computational biologist trying to answer a biological question using poly?
Question 4:
Timothy:
re: why did you choose to build poly in go?
stability, stability, stability, stability. I have a blog draft about this that I should post. I took great care in choosing the language given what was needed and python and r were so far away from meeting the requirements. it essentially came down to go and rust and go won out on how easy it is to learn and all the amazing dev tools in its ecosystem.
Question 5:
re: do you think the sw/bio field should move beyond python + r?
*absolutely.* no doubt in my mind. yes.
Nicholas: Can you expand on this? I’d be very curious to know where you thought python and r fell short (I have my own opinions, but curious to hear what sounds like a systematic process)
Anonymous: What about python makes it fail to meet the stability requirement? (I'm a data scientist who has spent 95-99% of my working life in python and don't really know better)
Anonymous: Lol nick beat me to it
Kevin: Any other interesting biology + go softwares out there?
Timothy: @nicholas I could go on for hours about this but essentially python and r were never meant to run on other peoples computers. getting started with and deploying with go is way faster. we don’t even ship docker containers because docker is written in go anyways so what we do is use goreleaser to autogenerate binaries for 14 os/architecture combos and leave the 1% that don’t use those to compile it themselves.
Since essentially if you’re not using an operating system we’re explicitly supporting you probably know what you’re doing and can compile it yourself anyways.
Looks like I should really release that blog post because I’ll honestly never go back to python and there’s far too many reasons.
Nicholas: Release the post!
Juan-carlos: Yea. I'm currently in python hell and I'll go ahead and state the obvious: python is overly permissive and the lack of strong types contributes to several classes of runtime errors that are simply designed away by a strongly typed language. also the flexibility of a language like python for scripting is far less relevant in a world of cheap compute and modern ides with code completion.
That's in addition to the dependency/packaging issues.
It's not just stability though - the lack of typing makes code illegible. types facilitate intuition by contributing to an applications domain ie semantic vocabulary ie "ubiquitous language" etc. you can include type hints in python and can require them via tools like mypy but the lack of a strong typing system is a fundamental flaw imo.
Timothy: @juan-carlos you definitely understand. I’ve never had to help someone compile or install poly because it just works…
Juan-carlos: Also performance. I'm currently trying to debug why sqlalchemy takes 3 minutes to marshall python objects for a query that takes 30 ms to run ¯\_(ツ)_/¯
As if lack of types, poor dependency managment, and weak performance weren't enough, python also has no access control modifiers so you can't enforce modules/methods/classes/etc stay private or internal to specific parts of the code. this reduces modularity of a codebase and generally leads to unnecessary coupling and "spaghetti code"
Question 6:
Juan-carlos: What biotech organizations do you perceive as having more mature software engineering capabilities that might choose to contribute or support poly development?
@timothy ie which companies are most likely to pay their engineers to contribute to poly
@timothy or if you'd thought of alternative funding models (eg daos or vc) I'd be curious to hear about that
Timothy: @juan-carlos unfortunately there are not many :grimacing:
Companies are far more likely to pay me to use poly to develop some process they need than pay their employees to do it. Right now I make my living as a consultant where Poly serves as not only the base of my work but my advertising and proof that I know what I’m doing.
I’ve talked to VCs and daos but right now I’m more interested in a company bringing me in full time to work on poly and apply it to stuff they care about. interviewing with a couple groups right now who are interested so fingers crossed.
Juan-carlos: @timothy please share updates with me :bow:
@timothy sometime's cos favor open source when it gives them a way to "catch up" with competition - might be an angle to play
@timothy - as in I'd love to know what companies are down to support fte working on poly.
Timothy: Biggest issue is a lot of companies talk to me in the hopes that I’ll close source poly for them but they have no intention of paying me to own it :face_with_rolling_eyes:
Juan-carlos: Lol. :face_palm:
Timothy: But yeah if full time works out you’ll definitely hear about it because I won’t shut up about it for at least a month.
Question 7:
Nicholas: Do you feel that biotech’s open source ecosystem is less developed than other industry verticals? if so, what can we do to change that?
Elizabeth: Follow on question - there's a lot of siloed open source tooling, but most biotechs are vertically integrated (so they'd need to leverage multiple silos)...do you think there's a way to start to pull more of this together? make it easier for the end user to see the slate of what's out there?
Timothy: @nicholas 100% yes in every way. in scale, funding, quality, everything. Had one techie reach out to me after searching “synbio” on github saying that poly was the only thing that looked like it was written this century.
It’s a major issue for the field in total and there’s a ton we can do. Biggest thing is money. Second is understanding the benefits of open source (dev cost savings, advertising to customers/employees, reducing cost of training new employees)
So many biotech companies raise millions then blow a quarter of it writing the same parsers and generic tools that should have already been available as open source.
Maybe I should just start telling their investors…
@elizabeth ncbi genbank should be required to ship a json format. Good portion of poly is just parsing data from these legacy formats into json so other open source ecosystems can work with it. It’s embarrassing how tech decided on a common format 15 years ago and biotech orgs in academia, industry, and government still haven’t widely adopted it.
Also better hiring practices. So many biotech companies have absolutely no clue how to hire and retain tech talent so devops, sre, secops, and generalists steer clear. Makes it hard for orgs to have people who understand the opensource landscape and can stitch things together.
Juan-carlos: @timothy honestly you might be able to get funding through a vc firm like a16z
Timothy: @juan-carlos want to be CEO and I’ll be CTO/ board chairman? Talking to VCs is exhausting lmao
Juan-carlos: @timothy lol I'll start by intro'ing myself on the discord. but I'm happy to volunteer some time pitching poly if it increases the likelihood I can work on it in the future.
Question 8:
Howard: What does poly do today and what would you like it to do within the next 5 years?
Timothy: Circular sequence hashing, codon optimization, dna synthesis checking and optimization, golden gate cloning, dna barcoding, parsing for genbank, multigenbank, gff, fasta, uniprot, and a bunch more parsers and features.
Timothy: In the next 5 years I’d like to do way more with metabolic and protein engineering which we’ve already started on. I’m also planning on building out a physical lab and doing some custom automation stuff.
Howard: What about primer3? it’s a widely used open-source primer design software, developed in … 2000 (or earlier)
Timothy: Last time I checked it was pretty hard to use for most and I even watched Sebastian Cociaba live fix a bug in a recent release.
We considered integrating it via cgo but we couldn’t guarantee stability. They definitely have more options and parameters than us but 95% of the time they aren’t needed.
Question 9:
Boris: Do you have any plans to build out a nice ui for wet lab biologists who are intimidated by the mere sight of a github url? Poly sounds powerful and useful, but I think most wet lab folks would rather continue using inferior tools than figure out how to set something up that requires using the command line
Timothy: No current plans. unfortunately good user interfaces are really expensive and would take a lot more resources than what I have on hand.
I’m open to raising money for a company that ships features as a GUI because it’s just that capital intensive to do it right.
Odds and Ends:
I kind of wince at terms like workflow or pipelines. end use really is for engineering organisms/dna and that’s what most users end up doing with it.
Kenny: @timothy why do the terms workflows/pipelines make you wince?
Timothy: I could write a whole diatribe about this but it’s mostly just aesthetic.
Timothy: @kevin absolutely yes! the go ecosystem is absolutely incredible. plenty of more genomicsy stuff there. rust and go have way more tools than you’d think if you’ve only used python and r.
I can’t remember project names off the top of my head but just search for go and bioinformatics on github and you’re sure to find some cool stuff. also docker, kubernetes, gitea, and so many other great things are written in go.