This is reposted from a Q&A we did in June 2022. Check out our Slack for more.
We had the chance to interview Uri Laserson, Co-Founder and CTO of Patch Biosciences. Below we discuss the company’s business model, drug discovery as a service, NYC biotech scene, and much more!
Nicholas: Welcome to our q&a with @uri! uri is the co-founder and cto of patch bio, a startup at the intersection of ml and gene therapy. i’ll get us kicked off with a few questions, but as always, the hope is that the community will jump in and ask questions!
Uri: Thanks @nicholas for inviting me, happy to answer questions!
Question #1:
Nicholas: Tell us about patch! what is your business model? what problems are you focused on solving?
Uri: We're working to design dna/rna regulatory elements to make genetic medicines better. think better promoters/enhancers, utrs, etc. and we're making use of massively parallel assays combined with machine learning!
We partner with companies that are developing genetic payloads and work with an upfronts/milestones/royalties model.
Anonymous: Do you have plans / ever envision bringing a drug to market yourself?
Uri: we don't have plans to develop our own drugs at the moment, but i wouldn't rule it out that it's something we would consider in the future. however, we mainly want to focus on the technology and working in as many drug programs as possible. if we developed our own therapies, we'd likely have to place just a few bets at first, which makes me sad.
Nicholas: Do you have wet lab space to test out the hypotheses you create or are you purely computational?
Uri: we are about half wet lab and half dry lab at the moment. we generate all our own mpra data.
Question #2:
Nicholas: How do you think about applying ml to bio? what are the biggest problems in bio that you see ml being useful?
Uri: Clearly i'm biased by what we're working on, but sequence design!
Image analysis has obviously been a pretty successful domain for a while now, so companies like recursion have really taken advantage of that
Protein modeling seems to have made some good progress recently as well.
At patch bio we're focused on regulatory elements, which seem to have some qualitative differences still in what the geometry of the fitness landscape looks like
Nicholas: What about the sequence design problem lends itself well to ml? i can see there’s been a huge amount of progress on the general nlp space — does this translate into the bio realm as well? are you mostly focused on unsupervised algorithms?
Uri: I'm mainly thinking of it from a feature extraction perspective. i feel like lots of papers from like 15 years ago had a lot of work manually crafting features from biological sequences, which is a huge hassle
More contemporary methods are well suited for picking out useful features and simple grammars. (check out the work of our advisor, anshul kundaje, for example)
Question #3:
Nicholas: You spent some time at cloudera as a data scientist. what made you want to work in a more “pure tech” company? how has that shaped your work since then?
Uri: In 2011 i was wrapping up my phd in george church's lab, which was developing lots of high throughput methods. around that time "big data" was the big buzzword and the open source hadoop ecosystem was really exciting. i talked with @u02rzpxklg7 when i was exploring postdoc opportunities and he convinced me i should spend my "postdoc" at cloudera. it was a great decision!
One thing it has done is cemented my appreciation for strong open source communities when picking new technologies
Nicholas: ^any thoughts on how to generate more sustainable open source communities in the bio/software space? it feels like a lot of open software comes out of labs, but isn’t especially well maintained.
Uri: This likely is a result of there being poor funding mechanisms to support this type of work.
Chan zuckerberg had a really nice rfp a few years ago looking to pump money into a bunch of open source projects used in science.
Afaict, there also isn't a huge culture of industry sponsorships of opensource bioinformatics tools
Question #4:
Nicholas: Before patch, you were a professor. why did you decide to leave academia to start a company?
Uri: I realized i was not loving the pi life. personally, i found being a pi to be very isolating: you're basically like the ceo of a small startup, but you're 100% equity owner, and all your employees are temps. everyone needs a 1st author publication, so even if your lab thematically works on a single topic, your employees are somewhat partitioned into the papers
Question #5:
Josh: Has it been it hard to raise money for patch? sometimes it feels like these companies sit between biotech and tech and thus can be hard to find the right investors.
Uri: I think there are lots of funders nowadays that are pretty interested in the intersection of bio and ml.
Such interdisciplinary approaches are no longer so "exotic" afaict
Question #6:
Anonymous: Do you do experiments to compare the ml derived sequences to human derived sequences? what metrics to do you use to quantify how good the ml is?
Uri: Yes, one of the things that's especially fun is that we also own all our data generation. we've implemented "design-build-test" cycle in our platform so we keep track of the predictions our models make on a set of sequences and compare them to the downstream experiments.
It's fun because it feels stronger than holding out a test set. we actually perform prospective validation.
Figuring out metrics is always a hard problem. one aspect is simply looking at measured activities against a set of benchmark sequences as we progress through our cycles.
But we also care to see how efficient we are at proposing active sequences compared with, say, a non-ml approach. (for example, limiting yourself purely to natural sequences.)
Anonymous: Yes, one of the things that's especially fun is that we also own all our data generation. we've implemented "design-build-test" cycle in our platform so we keep track of the predictions our models make on a set of sequences and compare them to the downstream experiments.
this is super cool. i worked with jb at benevolent and i bet this makes him super happy haha
Uri: Haha, indeed, implementing "cycles" and feedback was/is one of the core selling points for him still!
Question #7:
Anonymous: Related - how human augmented is the ml for sequence design?
Uri: Can you elaborate, i'm not sure i understand what you mean.
Anonymous: Yea totally - i think of it on a spectrum from a human writing down the whole sequence manually to directly outputting the results of your ml into the lab.
i guess what i am asking is if you have human experts looking at the sequences and changing a few nucleotides here and there post hoc, manually changing ml parameters for specific use cases (where you might care about something more or less than another use case), do you ever look at the output of sequence and go oh no no this wont work before even testing it? or is this not really possible for humans to do?
Uri: Oh yeah, humans are definitely part of the process. it's a way we can incorporate prior biological knowledge
Early on we trained a model on an experiment in which the model learned what a tata box is, and starting spitting out sequences like `tatatatatatatatataatatata`...everyone had a good laugh
Question #8:
Yohann: How much custom software do you have to build vs. patching together existing open sources & commercial softwares?
Uri: Open source software is *amazing* now, and we try to use it as much as we can. we tend to buy software more on the wet lab side, as it's harder to find good open source tools there.
(think inventory supply, lims, notebooks, etc)
On the computational side we're mainly on the python stack, and use nextflow for some of our etl pipelines
We try to be good citizens and report back any issues/bugs. a few of our team members have spent lots of time working in the open source world.
Question #9:
Nicholas: What software are you using to do data analysis? is it usually done by a bioinformatician in a notebook or do you have a dedicated analysis application?
Uri: Mainly jupyter notebooks or r, depending on the person/analysis. (though we tend to gravitate towards python)
Elizabeth: What drove this choice? prior familiarity? or other factors?
Uri: Mainly personal taste? i don't want to start a religious war, but imo the python ecosystem is more well-suited towards a wider set of engineering problems. and also all the popular ml frameworks are python
Nicholas: What about using notebooks vs using a tool with gui for data analysis?
Uri: What types of tools are you thinking about?
This obviously depends a lot on who you want doing the analysis. lots of computational biologists feel most comfortable at a repl, so it's not too much of an issue. since we're organized mainly as cross-functional teams, there are always computational people working on the data
Nicholas: Excel/prism/spotfire/whatever is in your lims (maybe others i haven't heard about?!)
Uri: That said, our experimentalists do want to look at the data and we're trying to figure out the best way to do that
Nicholas: It also depends on whether you're doing the same analysis over and over again
Uri: Those tools you mention aren't typically as amenable to the high throughput experiments we perform
Nicholas: Agreed! i'll follow up offline about this — i have some thoughts…
Uri: We are thinking about ways to expose our internal data in a centralized olap-style warehouse where you could attach tools like tableau. but those are pretty complex too tbh
Ines: We develop dashboards and r/shiny applications for biologists to explore their data
Dashboards and interactive documentation are quite powerful. we thought that works the best to build customizable frameworks
Question #10:
Yohann: The diversity of dna/rna gene therapies strategies (+ indications + modalities) makes it challenging to have ml pipelines you can reuse and scale. how do you prioritize your focus?
Uri: at least within a modality or within a tissue type, our data can be transferred from problem to problem to an extent, but this is mainly driven by bd considerations.
Question #11:
Justin: What has your experience been like within the nyc biotech ecosystem compared to other cities like boston or sf?
Uri: Nyc feels tiny compared to the other places. when i left mt sinai, i spent over a year looking for local biotechs and it felt like slim pickins. (i'm tied to the city for personal reasons.)
for years people have been saying that nyc is poised to be a biotech powerhouse but it seemed to never materialize. my personal hypothesis is that there aren't any flagship engineering schools in ny in the same way that there are in boston or the bay.
However, it does seem like there's a lot of biotech space coming online locally (e.g., through alexandria and others), so things may be finally changing
Question #12:
Jonathan: Have you, or would you like to, contribute back to the oss that you rely on?
Uri: Absolutely! a lot of our computational folks have worked in oss for a while and strongly believe in it. at the moment that mainly means usefully reporting bugs/issues in upstream projects, and in some cases contributing prs as well. if we end up finding that some of our internal tools would be broadly useful, we'd like to publicize as much of it as possible.
Given that we're relatively early stage, though, i can't say we've articulated some kind of policy/philosophy on when/how we open source our own tools
Jonathan: If you'd like to, happy to discuss the roadmap i helped develop at cellarity for such things and am now hoping to expand into more of flagship.
Uri: Yes, that would be great!
In the spirit, maybe you could release it publicly as well :smile:
Jonathan: Very much working on that. :slightly_smiling_face:
Question #13:
Nicholas: How do you feel about lims? in particular, do they work for the type of experiments (mpras and others) that you run?
Uri: Everyone hates lims. it just turns out that a system that can be expressive enough for the huge diversity of companies is basically a blank slate so you end up having to customize it like crazy
Question #14:
Nicholas: Can you give us a sense of the types of companies you work with? are they mostly large pharma or are you working with smaller startups as well?
Uri: Focused mainly on larger biotech/pharma
Question #15:
Nicholas: Ml can often be difficult to sell as a service, since you often need to be in close touch with the problem you're trying to solve and understand all of the biological context. how do you envision scaling without becoming a consulting company?
Uri: That's true...i don't see us as selling ml as a service though. i see us as selling drugmakers components of their drug that deliver better functionality. (e.g., selling a capsid to gene therapy company)
Elizabeth: Do you typically have exclusivity agreements with partners? i.e. is there some limit to whom you can work with?
Uri: Those types of agreements are pretty commonplace in biotech. can be exclusivity by organ, disease, modality, etc...and they all get factored in to the deal terms
Anonymous: How much do you let them see how the sauce is made?
Nicholas: Do your agreements involve clauses that allow you to train models on the data generated by that partnership?
Elizabeth: And definitely commonplace - but limits shots on goal so makes importance of bespoke engagement in any one project more important.
Uri: Yes, we definitely want to narrow the scopes as much as possible, unless we know we're going to get as many programs as possible
Question #16:
Vega: Do you deal with regulatory requirements like gxp compliance yet? how do you plan to build around that?
Uri: Nope, and hopefully i won't have to think about that for a long time or ever :stuck_out_tongue: we're focused upstream at the discovery stage.
Question #17:
Howard: This is a great q&a. my q: when developing a ml model that you hope will generalize across different conditions (e.g. dependence on cell cycle, cell type, etc), where do you think is the best boundary/mixture between “empiricism” (e.g. using 1-hot encoded sequence) vs. “mechanistic” (e.g. using position score matrices to identify tf sites)? what combination of “fat/dumb” layers (e.g. many conv or fully connected dense layers) or “thin/smart” layers (e.g. representing transcriptional regulation via curated functions) do you think will work best? this is an open ended question so feel free to just riff on it. :slightly_smiling_face:
Uri: Ha, i wish i knew the answer to this! as you're probably implying, our goal is not to show that we can have a fully general model learn things we already know about biology, but rather define models that efficiently help us search through sequence space. as part of our solution we definitely take a "meta-empirical" approach and perform architecture searches to find the best solutions for a given problem.
Anonymous: Big thank you to @uri and patch for the super fun q&a! please feel free to keep asking questions especially for the non-us folks who are asleep right now!
Uri: Thanks @anonymous, and thanks for inviting me @nicholas! (i had to go afk briefly to deal with a childcare situation but i'll be around if there are more questions.)