This is reposted from a Q&A we did in March 2022. Check out our Slack for more.
Today we are interviewing Rani Powers, Founder & CEO of Pluto Biosciences. Below we discuss how Pluto Biosciences deals with experimental and data diversity, prioritizes features, and much more!
Nicholas: Very excited to have @rani join us for a slack q&a today! rani is the founder and ceo of pluto bio (https://pluto.bio/), an experimental workflows platform.
Question #1:
Nicholas: Let’s get started with pluto: i know i’m not doing it justice — can you explain in a bit more depth what pluto does and where the motivation behind it came from?
Rani: Pluto is collaborative platform intended to make experimental results (data + visualizations) interactive, searchable, and securely shareable. the set of problems we're trying to solve was inspired by many fantastic colleagues i've worked with over the years, and my own evolution from a software engineer --wet lab biologist (zebrafish, cell lines, neurodegenerative diseases, cancer, trisomy 21) --computational biologist, and seeing the gaps in how results were generated and managed in the longer term.
Rani: Some of the gaps we're trying to address are between
• experimental data generation --initial results generation
• initial results generation (e.g. "out of the box" bioinformatics) --advanced results generation (e.g. custom algorithms and scripts)
• results that were generated over the past year --summarize progress on company milestones today
^ put more simply, we want pluto to be a hub where wet lab biologists can upload their data and run the first-pass bioinformatics analyses to generate interactive plots, their computational collaborators can fetch data programmatically out of pluto (in a consistent format) to use for their advanced scripts or in-house apps, and executive-level stakeholders can query across years of results to show the impact that the company has made
Question #2:
Nicholas: Where do you see the biggest barrier to adoption for more modern software tooling in this space? What sorts of abstractions have you built internally to help solve a wide range of biological problems?
Rani: This is such an awesome and big question, (and shoutout to @jesse for writing some of my favorite content recently in this area :woman-bowing:)
i'm a biologist, but the more important members of our team building & selling the product don't come from a scientific background, so we had to start from day 1 with a common vocabulary to catalyze the building & selling. as an example, this started with the experiment model i mentioned here - every life sciences organization has a different definition for "experiment" or "assay", so we decided to make ours as abstract as possible. an experiment in pluto is a container object with 1) quantitative measurements, 2) sample metadata, 3) results (whether tabular or visual).
Each of those 3 parts contained in an experiment is also abstracted. so the experiment model itself doesn't care necessarily whether it's an rna-seq experiment, or a toxicity study, or immunofluorescence, it just knows that it needs 1+2 in order to generate 3, and when it has 1+2+3 it can be considered "complete".
Obviously a lot more to unpack there, but that touches briefly on both the data model abstraction and the interdisciplinary-people/vocabulary abstraction, which are both super critical. using words like "container" for example allow our science team members to understand how to create an experiment, our backend engineer to implement a database model for the experiment, our frontend engineer to create the interactive page that's going to contain all the analysis goodness, and allow our sales rep to sell to customers the flexibility that an experiment on pluto has.
Question #3/4:
Nicholas: How do you respond to scientists who tell you “your software won’t work for my problem — my problem is special”?
Nathan: :bar_chart: what plotting library do you use and how do you prioritize different chart types/features to offer?
Rani: D3! i believe the only bio-specific library we use currently is for igv plots (that would have been a bit time-consuming to reinvent :sweat_smile:)
Rani: Re: prioritizing, it's related to nicholas' question about scientists thinking their problems are special i suppose. it's so far been surprisingly straightforward to prioritize chart types and plot-related customizations because the requests from different customers have been very consistent. this is probably due to the pluto product still being at a relatively early stage (so there are more "natural" next product steps), and also the fact that a lot of experiment types have a pretty standard set of plots that one would expect to see. for example, one of our first plot types after the standard barplots/boxplots/etc was the volcano plot. after introducing it, everyone started asking us for the ability to input a list of features to be labeled on the plot. so that was the next step. off the top of my head, i can't think of a request we've gotten so far that's been so customer-specific that it required a lot of consideration.
Question #5:
Jesse: How do you capture sample metadata such as cell line, treatments, treatment period, that
aren't captured in the instrument data?
do you integrate with elns?
do you have a workflow for bench scientists to enter it as they're preparing the samples?
Rani:
users upload that kind of sample metadata in a tabular format. we have "known" aka common fields defined including the ones you mentioned, but we don't restrict the user from entering new columns and allow fuzzy mapping when querying across experiments. i could imagine a world where we move to super-extensible user-defined schemas, but we wanted to start by seeing how the user-entered data would converge. we've also started exploring collaborations with ncats, the monarch initiative and larger scale ontology efforts on metadata organization. not our main focus, but certainly interesting.
short answer, yes! the pluto app is api-driven, we can import sample and treatment-related data from benchling, sapio, cdd vault, (haven't tried dotmatics yet), and other eln and lims. and using a pluto api token, third-party apps can pull pluto data back in to their application, which includes processed assay data (e.g. a counts matrix for rna-seq) and what i'm referring to as "results" (e.g. results of an analysis pipeline, like a differentially expressed gene table)
this is a great question because, interestingly, no. if you had talked to the pluto team a year ago, we would have told you that this was a must-have. we did a lot of design and engineering exploration around an in-app editable table flow but we decided to launch without it and see what happened. it turned out that it hasn't been a high-priority user need. i expected this to be much more of an obstacle for users, but so far, the feature requests we get are almost always related to downstream things once the data is in pluto, like different kinds of plots or multi-experiment comparison functionality.
Jesse: Oh, that's interesting. i guess many users are used to capturing the data in a spreadsheet as they go and registering it in an eln at the very end. so this would be a similar workflow.
Vega: Let's chat about dotmatics. happy to help/support with api documentation + info
Rani: Haha was going to tag you on the dotmatics comment @vega
Jesse: Do you know what proportion of users use an api integration vs uploading spreadsheets?
Rani: majority spreadsheets, including public pharma cos (maybe that isn't a surprise to folks in this community but it was certainly a surprise to me)
Question #6:
Yohann: With the diversity of data type and analysis used today, how do you prioritize features for pluto?
Rani: I'll give a bit of a roundabout answer to this one because i'm curious to get this group's reactions :grin: most of my previous work has used -omics data, so we were familiar with the data types and saw an opportunity to help with the managing-large-data-files challenges there to start off with. but at the risk of sounding strange, we intentionally built the product with surprisingly little thought about the types of analyses we'd support. our first focus was on collaboration and how to make it happen faster. so we built an abstract model for an experiment (which would presumably later have a "type", which would determine the format of data and sorts of analyses you'd run on it). from there, we built out role-based permissions allowing someone to view, edit, even create "draft" experiments, and assign experiments to other team members, with some special attention given to the idea of external collaborators (e.g. cros who may be in possession of those large data files your company want to analyze).
Rani: By streamlining the collaboration aspects first and showing that to customers, we naturally found our way the customers who had the types of data and analysis that would benefit most from that plumbing.
Not surprisingly, for a lot of customers that most often meant sequencing-based data, where the hardest part can be at the front - getting the large fastq files into a usable state for analysis. so we thought that if we could get that data stored, processed, qc'd, and accessible as quickly possible, we could leave it up to the users to drive the kinds of analyses that they want to do next.
(which is one thing that i personally love about pluto #shamelessplug - once someone uploads data, instead of being shown an "rna-seq report" that's the same for every experiment, they're presented with a blank canvas and given the chance to answer the question, what scientific hypotheses do you want to test with this data and what analyses would help you do that?)
Nicholas: Sometimes a blank canvas can be overwhelming. when do you think it makes sense for pluto to be opinionated about an analysis?
Rani: From a long-term product perspective, it's a good question and i don't know yet. i'm sure the answer will change every year. from a practical perspective at this moment, our customers have all been pretty clear about communicating if and when they want opinions. some do want opinions (those customers have templates in pluto that get applied whenever they run a particular assay), but others are domain experts on the specific assay or analysis already, so they prefer having the freedom to run with it on their own.
Question #7:
Vega: Does pluto’s platform capture an audit history? and are you currently or in the future planning to support data management that is gxp compliant?
Rani: Yeah! this is an important one. each experiment has a granular change log, so you can see a timestamped (and exportable) record of who uploaded data, changed an analysis parameter, edited the title of one of the plots on the page, etc.
however, unlike elns, pluto is not 21 cfr part 11 compliant (nor do i think the entire platform should be, given the interactive nature of the results in pluto... we've had some very interesting product strategy discussions around this, so if anyone is interested in weighing in, please dm me! i'd really love to hear your thoughts)
Rani: We're planning to support gxp compliant data management as early as this fall for pharma partners who have requested/required it
Question #8:
Nicholas: Do you integrate with automation or lab machines directly? why or why not?
Rani: We don't have any customers with this need yet so we haven't spent any cycles on it, but i've been super energized by the innovation (esp from groups in this slack org) in the automation space. i was previously at helix, a genomics company and illumina spin-out with a nice big lab in san diego, so lab automation and robots have been fascinating to me since then. (if you're working in this space, message me so we can nerd out over virtual coffee and potential pluto integrations plz)
funny story though: the one question we've gotten during a discovery call with a lab incubator about integrating with a shared lab machine directly seemed like an interesting use-case but, upon deeper questioning, turned out to be "can we run your software locally on the microscope computer because our building wifi sucks" rather than a true integration need, so that was a bummer but can't solve em all i guess :sweat_smile:
Question #9:
Nicholas: How do you think about data pipelines and integrating with bioinformatics workflows?
Rani: We integrate with public and private repos, run bioinformatics workflows in nextflow and flyte. the pipelines and the analysis results they produce are versioned for reproducibility/auditability, which can be a helpful improvement for a customer's proprietary pipeline that they were previously running in-house for example, but there's nothing particularly novel about our approach there.
probably the main difference in how we think about workflows in pluto compared to platforms in more of the "bioinformatics infrastructure" category is that in our product, the workflows themselves are a secondary focus after the collaboration and visualization. by that, i don't mean at all that they are less important but rather that, for our current customer segment anyways, the bioinformatics pipelines play a known/stable/expected role in how the business is going to meet its goals. so rather than needing a platform for running more pipelines, it's the saas stuff outside of the bioinformatics (the upstream experimental planning / workflow management, and the flexible visualization-building downstream) that was the differentiator for them. they chose pluto to help hit their kpis by increasing operational efficiency and decreasing time to biologically-meaningful results. oftentimes that involves running bioinformatics workflows in between, but not required.
Nicholas: Thank so much to @rani for the q&a! this has been extremely fun to learn more about pluto. please keep asking questions and i'm sure rani would be happy to answer them :)
Rani: Anytime! appreciate you organizing, nicholas. hope to make it back out to the bay area for another happy hr soon :relaxed: