This week in Bits in Bio 2024/04/01
This week, the noose tightened around WuXI, researchers published many interesting foundation models (for CT scans, histopathology, and image annotation), as well as some pretty cool datasets released (ALS, Breast Cancer, and cell atlas).
We are also thrilled to announce the start of the BiB Buddy program for which you can register here!
Pharma and Biotech
2024/03/28 STAT+: Covid19 had the silver lining of providing a natural experiment in which a completely new virus entered a population. The collection of biosamples during that time helped reveal that the time needed to build memory B cells, the period needed to lose protection, or the effect of imprinting were more heterogeneous than previously suspected. This data will likely help us understand how humans develop immunity.
2024/03/28 STAT: The UK published a report containing the list of all clinical trials conducted in its borders and which sponsors failed to be in compliance with their reporting duties. This effort is hailed by the Cochrane, known for their meta-analysis work, and is in line with the UK’s culture of good data practices with regards to medicine (such as the work of openprescribing.org that is used to track evolutions in prescriptions, or whether doctors follow certain best practices). Previous work in that direction was done by the TrialTracker project.
2024/03/28 Reuters: An intelligence report claims that WuXi AppTec may have shared patient information with China. This news comes right as an industry trade group severed ties with them, as they are targeted by the new Biosecure Act which would forbid them from having any business with US companies. The fact that Chinese companies tend to have ties with the CCP is rather old news, and it is doubtful that this practice just started. To a cynic like me, the more likely explanation for this leak and the US efforts to block WuXi would be that it is a cheap way to enact industrial policy by removing a foreign competitor and favoring national companies. This also got some nice coverage in “In the Pipeline”.
2024/03/27 STAT: A recent study in Science Translational Medicine identified that Multiple Sclerosis (MS) may be divided into three categories. Each category involves different immune cells behaviors and patients may respond differently to therapies. We may hope that this work could lead to targeted therapies for MS in the same way that the original paper for molecular classification of breast cancer did for cancer patients.
2024/03/27 STAT+: The Weissman group published a study in Nature which shows promising results for reversing the effect of aging in the immune system. All blood cells, including immune cells, come from Hematopoietic Stem Cells (HSC) that can divide and differentiate into specialized cell types. There are two kinds of HSC: While both kinds of HSC - bal-HSC and my-HSC - produce a mix of myeloid and lymphoid cells, my-HSC mainly produces myeloid cells. As people age, the my-HSC population tends to represent a higher fraction of HSC, and the body becomes depleted in lymphoid cells, reducing its ability to fight novel pathogens. The study found an antibody that targets these my-HSC in order to increase the proportion of bal-HSC and hopefully boost the adaptive immune system. This hypothesis was proven in mice (where the treated group was more resistant to a deadly retrovirus infection). While this idea of changing the ratio of bal-HSC to my-HSC sounds extremely promising, killing HSCs seems very risky and may prove too risky in humans. However, other approaches to correct that balance via cell reprogramming or targeted transcription factors may also be explored in the future.
2024/03/27 PacBio: As part of the European 1 million genomes project, the Estonian biobank is partnering with PacBio to sequence 10.000 genomes using its long read sequencing technology. To the best of my knowledge, it would be the largest biobank of long read sequencing data and could be extremely useful for studying the role of transposable elements (involved in e.g. loss of tail in humans), which is extremely hard to investigate using short-read technologies
2024/03/27 SurgeCare: SurgeCare, a company focused on leveraging AI techniques for identifying biomarkers, gets a second funding round of $7.5m.
2024/03/26 AP: The US supreme court heard the oral arguments in the mifepristone case. Their decision could have a vast impact if they were to keep the lower court decision (basically that courts can override the FDA) as it would create huge regulatory uncertainty for the pharma industry. After the hearing, the consensus seems to be that the court will rule for the FDA against the plaintiffs (Christian doctors claiming that they would be forced to treat the side effects of mifepristone, and that it would be against their conscience) on the basis that their case lacks standing.
2024/02/25 Fierce Biotech: The first patient being implanted a Neuralink device went from being able to move a mouse cursor, to being able to play chess and Civilisation V. While that is impressive, such feats have been possible with simple non-invasive protocols like electroencephalograms (EEG) (this one in 2012) for years or fMRI (like this tool showing your “thoughts”).
2024/03/25 STAT: A small opinion piece comparing the recent hype about GenAI tools for medicine (e.g. chatGPT) with the previous hype about IBM Watson. An interesting point that they raise is that the current enthusiasm about these technologies will have to be followed by a serious deployment of them. An analogy would be how we are all motivated to get in shape on Jan 1st (use modern AI tools), but end up back as couch potatoes (using fax) after a few weeks, when the hard work takes its toll.
Papers and Science
2024/03/29 BiorXiv and BiorXiv: Algorithms for a Commons Cell Atlas & A human commons cell atlas reveals cell type specificity for OAS1 isoforms— The authors introduce Common Cell Atlas, which integrates data from 525 different datasets and process them in the exact same fashion. The reason for this work is that the Human Cell Atlas and CellxGene projects are actually “just” lists of datasets and not a unified dataset containing all this data. Here the authors processed data in a uniform fashion, as well as computing the expression in an isoform specific fashion instead of gene specific (the accuracy of that computation is not evaluated though). Lior Pachter wrote a very insightful tweetorial thread on the project.
2024/03/28 Nature Genetics: A single-cell atlas enables mapping of homeostatic cellular shifts in the adult human breast — A second data publication of a single-cell breast atlas, it would be nice to see how well it integrates with the first one.
2024/03/26 Nature Methods: The multimodality cell segmentation challenge: toward universal solutions — Summary and analysis of the results of the NeurIPS 2022 cell segmentation challenge. This paper also provides an interesting benchmark dataset for cel segmentation tasks. Bo Wang wrote a nice tweetorial on the paper.
2024/03/26 ArXiv: Tutorial on Diffusion Models for Imaging and Vision — Monograph on Diffusion models.
2024/03/25 Nature Methods: Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis — The authors used chatGPT to annotate cell types in scRNA-seq data. Their method simply finds the differentially expressed genes for each cell group, and then asks chatGPT what kind of cells have such marker genes. The “model” is literally just one prompt to ChatGPT as can be seen in their code. While it is interesting that chatGPT has a decent accuracy there, I hope that no one actually uses this tool; it would mean that scientific discovery of cell types and function would be done by an auto-complete, which would be incredibly silly.
2024/03/24 Cell: Single-cell dissection of the human motor and prefrontal cortices in ALS and FTLD — The study produced a large dataset of both ALS and FTLD brain cells, and found that at the RNA expression level, the diseases have striking similarities. It is accompanied by an overview in MIT Tech Review.
2024/03/22 Nature Machine Intelligence: Generative AI for designing and validating easily synthesizable and structurally novel antibiotics — The team from the original AI-designed antibiotic introduces a new tool, SyntheMol, that helped them design even more antibiotics with in vitro efficacy.
2024/03/21 Nature Biomedical Engineering: Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models — A tool for generating whole-slide images from RNA-seq, the goal is to use this data for data augmentation when training image classifiers, such as UNI.
2024/03/19 Nature Medicine: A visual-language foundation model for computational pathology — The authors introduce CONCH, a foundation model trained on histopathology data with accompanying annotation. They use a method similar to CLIP for learning how to annotate images, or generate images from an annotation. Similar work has been done in BLEEP, Quilt-1M, and others, so the field is a bit crowded in that research area. Interestingly, the authors benchmark their model on a lot of different tasks, including segmentation and classification in which they get high performances, instead of just evaluating them on annotating the image slides.
2024/03/19 Nature Medicine: Towards a general-purpose foundation model for computational pathology — The authors introduce UNI (code and weights available here), a large pretrained image model for digital pathology, evaluated for whole slide image analysis. Having a pretrained model available should be extremely valuable for groups that do not have the expertise to build them from scratch, it could act as a nice foundation to build custom models for medicine.
2024/03/15 Nature Machine Intelligence: Foundation model for cancer imaging biomarkers — The authors build a foundation model for analyzing CT scans, using a ResNet50 architecture, the model is available here.
2024/02/13 Nature Nanotechnology: Full-length single-molecule protein fingerprinting — While single molecule sequencing technologies are currently working for DNA and RNA, it is still in its infancy for proteins. In this paper, the authors use a fingerprinting technology that allows them to measure the presence of just a few amino-acids, which can be sufficient to recover the protein identities when paired with a reference database. This paper also has a nice 2-page overview published in the same issue.
Community events
Augmenting Hospitals: Infrastructure to Care Delivery
Paris | 2024/04/02 | 6:30 PM | +MOREThe Future of Data Science in Biotech: AI and Beyond sponsored by Ontologic
Boston | 2024/04/03 | 6 PM | +MOREBits in Bio Shanghai Happy Hour
Shanghai | 2024/04/11 | 7 PM | +MOREGinkgo Bioworks Presents: Ferment 2024
Boston, Virtual | 2024/04/11 | 8 AM – 7 PM | +MOREBridging the gap: Creation to communication with AI in pharma
Virtual | 2024/04/11 | 10 AM EDT | +MORE
Jobs announcements
(Sr.) Informatics Developer at Gate Bioscience
Biological Data Scientist at Talus Bio
Co-Founder in Residence (AI/ML for Crop Improvement) at Deep Science Ventures
Computational Biologist at Prox Biosciences
Fermentation Engineer at twig
Machine Learning Internship at Dyno Therapeutics
Senior Product Manager at Ark Biotech
Software Engineering Intern at Broad Institute
See the full list of jobs on the community job board.
We very likely missed quite a few announcements, don’t hesitate to DM @gama_search if you see anything missing or needing corrections.