Cannabis has played a critical role in human societies for over ten millennia. Although contemporary attention often centers on its psychoactive component, THC (tetrahydrocannabinol), historically this plant served as an essential source of seed oil, textiles, and nourishment. Despite its longstanding significance, cannabis remains relatively underexplored and underexploited in modern agriculture. Recent legislative changes in the United States—most notably the 2014 and 2018 Farm Bills—have spurred renewed interest in cultivating cannabis for medicinal, grain, and fiber purposes.
A team at the Salk Institute has now produced the most exhaustive and precise genetic atlas of cannabis to date. By examining 193 distinct cannabis genomes, researchers uncovered remarkable genetic diversity and complexity within this foundational crop. This collaborative effort involved Oregon CBD, Oregon State University, and the HudsonAlpha Institute for Biotechnology, and culminated in findings published in Nature on May 28, 2025. These discoveries promise to catalyze significant advances in cannabis-based agriculture, medicine, and industry.
“Cannabis is one of the most extraordinary plants on Earth. Despite its global importance as a source of medicine, food, seed oil, and fiber for at least the last 10,000 years, it remains one of the least developed major crops of modern times, largely due to a century of legal restrictions,” says Todd Michael, senior author of the study and research professor at Salk. “Our team constructed the most complete genetic map, or pangenome, of the plant to date by analyzing nearly 200 diverse cannabis genomes, showing that we are just starting to see the full potential of this amazing plant. Those same legal restrictions spurred an underground breeding revolution, revealing cannabis’s power as a chemical factory. With this new genomic blueprint, we can now apply modern breeding to unlock novel compounds and traits across agriculture, medicine, and biotechnology.”
Background: Cannabis as a Versatile Crop
Cannabis sativa, often referred to as hemp, is a flowering species indigenous to Asia. Its versatility stemmed from unique characteristics—such as fiber strength suitable for textiles and the capacity to produce high concentrations of cannabinoids—that made it indispensable through much of human history. Today’s researchers envision cannabis oil eventually rivaling conventional seed oils like canola or soybean, and anticipate that cannabis-derived compounds might even serve as sustainable alternatives to fossil fuels.
Often described as a “chemical powerhouse,” cannabis can allocate over 30 percent of its dry biomass to terpenes and cannabinoids. These small molecules play defensive roles against pests in the field, yet humans have repurposed them for their aromatic and therapeutic properties. Terpenes are responsible for the distinctive fragrances of fruits and flowers, while cannabinoids interact with the endocannabinoid system in humans, delivering a range of effects. The strain “Charlotte’s Web,” rich in cannabidiol (CBD), brought widespread attention to cannabis’s medicinal potential when it alleviated epileptic seizures in children. Beyond CBD and THC, more than one hundred lesser-known cannabinoids are under investigation for treating conditions such as chronic pain, arthritis, nausea, asthma, depression, and anxiety.
Historically, breeding efforts—often conducted clandestinely because of legal restrictions—have dramatically shaped cannabis’s chemical profile. However, until now, the impact of this selective breeding on the plant’s overall genetic diversity remained poorly understood. Two primary challenges have hampered comprehensive genetic insight: first, cannabis is among the minority of flowering plants that exhibit dioecy (separate male and female plants); second, its genome is replete with transposable elements—repetitive DNA segments capable of relocating within the genome—which complicate sequencing and assembly efforts.
Key Discoveries: Unprecedented Genetic Diversity
To map cannabis’s genetic architecture, the Salk team employed both short-read and long-read DNA sequencing technologies. Traditional short-read methods fragment DNA into small stretches (hundreds of base pairs), making it difficult to reconstruct complex or repetitive regions. In contrast, long-read sequencing captures thousands of base pairs in a single read, allowing for more accurate assembly of difficult genomic regions.
“There are limits to what you can discover with short-read sequencing technologies, since those short genetic excerpts are impossible to stitch together in any meaningful way when looking at complex regions of the genome, especially repetitive DNA sequences,” explains co–first author Lillian Padgitt-Cobb, a postdoctoral researcher in Michael’s lab. “We’re among the first to harness this long-read technology at scale in the pangenome context, and with that comes all these insights into structural variation and gene ordering that can inform end-game decisions about breeding favorable traits into cannabis plants.”
This investigation builds upon earlier work: in 2018, Todd Michael generated the first chromosome-level cannabis genome using long-read sequencing, elucidating the loci responsible for cannabinoid biosynthesis and clarifying the breeding origins of the “Charlotte’s Web” variety. The current study, however, extends far beyond a single reference assembly. By assembling 193 genomes—drawn from 144 individual plants worldwide, with haplotype resolution for each—researchers achieved a far more comprehensive “pangenome.” Because cannabis is diploid (possessing two chromosome sets, one maternal and one paternal), haplotype resolution meant that each plant contributed two separate genome assemblies—amounting to 193 total genomes.
Resolving both haplotypes for each individual plant revealed an extraordinary degree of variation: in some regions, cannabis exhibits up to twenty times more genetic diversity than humans. This level of resolution allowed the team to trace exactly which chromosomal segments were inherited from each parent, providing direct insight into breeding histories and genetic backgrounds.
Gene Presence and Variation
Analysis of the pangenome revealed that:
23 percent of genes are found in every genome assembly (core genes).
55 percent of genes appear in 95 to 99 percent of genomes (near-universal genes).
21 percent fall within a group present in 5 to 94 percent of genomes (variable genes).
Fewer than 1 percent are unique to a single genome.
Interestingly, genes involved in cannabinoid biosynthesis tended to be highly conserved across varieties, reflecting intense human-driven selection for THC and CBD content. Conversely, genes linked to fatty acid metabolism, growth, and defense exhibited substantial variability—presenting opportunities for breeders to develop hemp with improved field performance or enhanced nutritional profiles in hemp oil. Notably, the study identified structural variants in fatty acid biosynthesis genes that influence production of tetrahydrocannabivarin (THCV), a less common cannabinoid noted for its non‑psychoactive, energizing properties.
Further investigation showed that both THC acid synthase (THCAS) and CBD acid synthase (CBDAS) genes are under strong selective pressure in cultivated varieties, explaining the uniformity of cannabinoid content in many commercial lines. These synthase genes, and other cannabinoid-related loci, often reside within transposable elements—mobile genetic sequences that can jump to new genomic locations. Breeding programs targeting genes embedded in these transposons have unintentionally generated a wealth of structural variation, contributing to the remarkable phenotypic diversity seen in modern cannabis.
Sex Chromosome Insights
Because cannabis is dioecious, understanding its sex chromosomes is critical. Previous reference genomes lacked full data on both X and Y chromosomes, leaving male-specific genes unexplored. By including sex chromosomes in their assemblies, the Salk team produced the first comprehensive view of the cannabis Y chromosome. Among their findings, they discovered male-only genes that offer pathways to breed offspring with enhanced agronomic traits. Until now, many marijuana breeders have practiced “feminization” (inducing female plants to produce pollen), thereby bypassing the Y chromosome entirely and potentially overlooking valuable genetic diversity unique to male plants.
Looking Ahead: Breeding for Health, Industry, and Sustainability
Armed with this expansive pangenome, researchers and breeders can pursue a variety of objectives:
Identification of Wild Relatives: By comparing European and Asian hemp genomes, the team suspects the existence of an ancient, as-yet-undiscovered wild relative in Asia. Such a wild lineage is likely to harbor unique adaptations—drought tolerance, disease resistance, or distinctive metabolic profiles—that breeders could harness to develop more resilient cultivars.
Reintegration of Male Genetics: Given the newly characterized Y chromosome, breeding strategies that include true male plants could unlock untapped trait potential. Male‑specific genes may contribute to higher yields, enhanced fiber quality, or unique cannabinoid profiles. Including male genetics could augment breeding programs that have traditionally focused solely on female lines.
Improved Nutritional Profiles: Variable genes affecting fatty acid metabolism offer a promising route to optimize hemp seed oil composition. By selectively breeding favorable alleles, hemp oil could become competitive with established seed oils—such as canola or soybean—or even outperform them nutritionally.
Expanded Medicinal Uses: Insights into cannabinoid synthase clusters and their genomic context may enable breeders to develop novel chemotypes—plants producing tailored ratios of cannabinoids and terpenes for specific therapeutic applications. For example, varieties engineered to overproduce THCV could address metabolic disorders, while lines optimized for minor cannabinoids might target neuroinflammatory or psychiatric conditions.
Industrial Hemp Optimization: Variable genes associated with growth and defense can be selected to produce hemp varieties suited to diverse climates and soil types. Improved fiber strength, pathogen resistance, or biomass yield could expand hemp’s role in textiles, bioplastics, and construction materials.
“Over the last 10 years, breeders have already done a decent job of getting yields up and making cannabis an economically viable crop,” says co–first author Ryan Lynch, a postdoctoral researcher in Michael’s lab. “Once there’s market interest there, paired with these new insights into cannabis genomes that can guide breeding efforts, I can see hemp and hemp oils really booming in both human health and industry applications.”
In the near term, the pangenome will serve as an open-access resource for scientists worldwide, guiding crop‑improvement strategies and accelerating discovery. By tapping into previously hidden genetic variation, researchers can unlock cannabis’s full potential—as a high‑yield fiber crop, a nutrient‑rich oilseed, and a source of novel medicinal compounds.
Acknowledgments and Funding
Additional authors on the paper include Nolan Hartwick, Nicholas Allsing, Anthony Aylward, Allen Mamerto, Justine Kitony, Kelly Colt, Emily Murray, Tiffany Duong, and Heidi Chen (Salk Institute); Andrea Garfinkel, Aaron Trippe, and Seth Crawford (Oregon CBD); Brian Knaus and Kelly Vining (Oregon State University); and Philip Bentz, Sarah Carey, and Alex Harkess (HudsonAlpha Institute for Biotechnology).
Funding for this work was provided by the Tang Genomics Fund; the National Science Foundation (NSF‑IOS PRFB 2209290, IOS‑PGRP CAREER 2236530); the Bill & Melinda Gates Foundation (INV‑040541); and the U.S. Department of Agriculture (USDA NIFA 2022‑67012‑38987, USDA NIFA 2023‑67013‑39620).
About the Salk Institute for Biological Studies
The Salk Institute is committed to uncovering life’s fundamental mechanisms. Founded by Jonas Salk—developer of the first safe and effective polio vaccine—the Institute is a nonprofit, independent research organization dedicated to breakthroughs in neuroscience, cancer, aging, immunobiology, plant biology, and computational biology. Small by choice and intimate by design, Salk’s world‑class scientists pursue fearless research aimed at improving human health and well‑being. For more information, visit www.salk.edu. Journal: Nature.