Managing and utilizing genomic datasets is an essential part of plant breeding programs. As these datasets become increasingly large, however, breeders need tools to merge and store new and existing genome sequence data. Database tools can help manage large datasets and reduce the amount of new data that plant breeders need to produce for their breeding programs.
In response to this increasing need for database tools that assist breeders, a team of Buckler Lab members, led by graduate student Sarah Jensen and postdoc Guillaume Ramstein, developed a genomic database for sorghum using a breeding tool called the Practical Haplotype Graph (PHG). The PHG codebase was developed by programmers Zack Miller, Lynn Johnson, Terry Casstevens, and Peter Bradbury. Project manager Cinta Romay, and former Buckler Lab member Ramu Punna, also contributed to the project.
The group tested whether the sorghum PHG can merge sequence data from many individuals without losing important information about differences between varieties. They also compared the PHG to existing tools to determine how well the PHG can predict genotypes for new individuals and found that the PHG performs better and requires less new input data than state-of-the-art methods.
The team found that the sorghum PHG could accurately predict genotypes from sequencing data covering only 1% of the genome, which can be produced for less than $10 per individual. Results showed that PHG is a useful research and breeding tool because it maintains information from a diverse group of individuals, stores genome sequence data in an accessible format, unifies genotypes derived from different genotyping methods, and provides a cost-effective option for genomic selection for any species.
You can read more about the development of the Sorghum PHG here.
The PHG paper is now published in The Plant Genome and can be found here: https://doi.org/10.1002/tpg2.20009