Cannabis Phylotree
Hovering over a strain of interest will highlight the strain along with other strains whose genetic distances to the interested strain are color coded as following:
- Red: Very Closely Related
- Yellow: Closely Related
- Green: Moderately Related
- Blue: More Distantly Related
- Purple: Very Distantly Related
Cannabis & Hemp Phylogenic Tree
The cannabis phylogenetic tree or evolutionary tree shown below is a branching diagram or “tree” showing the inferred evolutionary relationships among various cannabis strains—their phylogeny—based upon similarities and differences in their genetic characteristics. Here the genetic characteristics are the identity and frequency of Single Nucleotide Polymorphism (SNP) in the captured regions of each cannabis strain genome (see description in each Tab under the tree for details). Genetic distance is a summary measure of the genetic divergence between cannabis strains, and the genetic distance of any two strains is approximately proportional to the total length of the radial parts in the path connecting each “leaf”, representing the two strains. SNPs of each strain are identified by DNA sequencing (StrainSEEK® Panel or Whole Genome) or SNP genotyping (CannSNP90 SNP chip) and bioinformatics analysis from MGC’s Genomics Services.
This phylotree is derived using the intersecting high quality SNPs from samples analyzed with StrainSEEK® 3Mb (V2) and 10Mb (V3), Whole Genome Sequencing, and CannSNP90 genotyping chip.
StrainSEEK®: Over 10 million bases are sequenced to 10x coverage in each plant using a targeted enrichment approach (Agilent SureSelect with Illumina NGS). This targeted approach full coverage of key genes in the cannabinoid and terpene synthase pathway, significant coverage of other important gene categories, including flowering, and pathogen and disease resistance, while also covering hundreds of thousands of randomly distributed SNPs from StrainSEEK® V1, StrainSEEK® 3Mb (V2), Sawler, Lynch and the Phylos Galaxy. As result, samples sequenced with this method can be cross compared to all data that is public as of 2017. This is over 100x more sequence than other tests on the market and as a result is the most comprehensive sequencing tool for discerning clones from siblings and identifying uniqueness of a given strain. The method delivers 300,000 to 500,000 SNPs across the genome with a concentrated contribution from chemotype related genes. The higher SNP density enables Marker Assisted Selection for breeding. StrainSEEK® 3Mb (V2) included approximately 3 million bases sequenced to 10x coverage, using the same targeted enrichment approach, resulting in full coverage for the key cannabinoid synthase genes and 30,000 to 50,000 randomly distributed SNPs.
Whole Genome Sequencing: Using a shotgun approach, the entire 876 Million bases of the Cannabis genome are sequenced using Illumina next generation sequencing, resulting in the widest coverage of genes as well as non-coding regions of the genome. Samples sequencing using WGS have the added benefit of being “future proof”, as new genes or genomic regions of interest are found, existing WGS data can be reanalyzed without needing to be sequenced again.
CannSNP90: A comprehensive SNP chip with over 89K designed markers developed by MGC and Eurofins. The chip includes trait-specific markers for cannabinoid genes, terpene genes, plant sex, disease resistance, chemotypes, as well as randomly distributed SNPs across the genome and approximately 6.5k SNPs that overlap with StrainSEEK®.