Creation of the Pangenome Bovin Consortium

Creation of the Pangenome Bovin Consortium

Creation of the Pangenome Bovin Consortium

The construction of a single reference genomic sequence for each species has made a major contribution to the analysis of the genome, its diversity and its functioning. But it gives an incomplete representation of the diversity of genomes within a species. By making it possible to produce longer sequence fragments, new sequencing technologies will enable us to go even further in our knowledge of the genomes of each species. INRAE researchers are taking part in the "Bovine Pangenome Consortium", a large-scale international collaboration whose aim is to produce a pangenome, i.e. a large set of high-quality, interconnected genomes, making it possible to describe the full genomic diversity of the species. The consortium's full objectives are described in an article published in the journal Genome Biology.

Since the 2000s, animal genomics work in most species has been based on a "reference" genome assembly, i.e. the best possible representation of the genome of a particular individual of the species studied. This reference is the subject of major investment to maximize its quality and to annotate it, i.e. to decode and give biological meaning to the information contained in the DNA sequences.  The reference genome is a fully public resource, readily available in sequence databases such as Ensembl or NCBI. By comparing sequence data from other individuals with this reference, genetic variants can be identified. This highly efficient approach has made it possible to characterize tens of millions of variants within each species, mainly of small size (substitutions of a single base on the sequence (SNPs) and small insertions-deletions of a few bases at certain points in the genome), with the applications that we know: development of genotyping tools; detection of genome zones (QTL) or specific variants involved in the variability of traits of interest; genomic selection.

However, this approach has its limitations.
The search for structural variants, which are associated with more extensive modifications of the genomic sequence, is complex to conduct and of limited effectiveness. In addition, many of the sequences produced are not mapped onto the reference and are rejected.This is because the genome of the reference individual does not necessarily contain all the sequences present in the species and, more generally, because the genome of the individual studied differs sufficiently from the reference that it is not possible simply to characterize their differences.Indeed, due to the multiple genome rearrangements that occur during the evolution of a species (exchange of sequence fragments between chromosomes, duplication or inversion of the direction of certain portions of the genome), genome collinearity is not complete within a species, and even less so between species, even if they are close.

To resolve this difficulty, we need to consider several assemblies, so as to represent all the complex combinations of genomes present in a species. The "pangenome" is the collection of all the genomes, i.e. both the sequences common to all individuals (the core-genome) and all the sequences specific to each individual.These assemblies are not considered independently of each other, but organized in the form of a graph that lends itself well to mathematical and computer modeling.In figure 1, a graph of a fraction of 7 genomes from 5 closely related species, each genome is described by a path of a particular color.

INRAE researchers are participating in the Bovine Pangenome Consortium, a large-scale international collaboration that aims to build a pangenome bringing together the genomes of a large number of taurine and zebu cattle breeds, and related species.

An article published in Genome Biology details the consortium's objectives. The pangenome produced will be entirely public and accessible thanks to standardized tools, also proposed by the consortium.It is intended to replace current breed-specific reference assemblies in the medium term. The consortium is open and invites partners who share its vision to participate in the production of these assemblies and a representation of the pangenome recognized by the community as a public resource. By enabling further progress to be made in reconstructing the history of bovine populations, and in understanding the links between genomic information and trait expression, this resource will promote the development of more sustainable cattle breeding.

Contact :

Reference :

Smith T.P.L., Bickhart D.M., Boichard D., Chamberlain A.J., Djikeng A., Jiang Y., Low W.Y., Pausch H., Demyda-Peyrás S., Prendergast J., Schnabel R.D., The Bovine Pangenome Consortium, Rosen B.D. 2023. The Bovine Pangenome Consortium: democratizing production and accessibility of genome assemblies for global cattle breeds and other bovine species. Genome Biology, 24, 139.

Modification date : 17 November 2023 | Publication date : 10 August 2023 | Redactor : INRAE - Edition P. Huan