Sequence capture

Identification of causal mutation using sequence capture

Current mapping tools and methods can localize genes responsible for genetic variation with an accuracy of within 1 cM corresponding to about one million base pairs.

In this interval, several thousands of polymorphisms exist but, in most cases, only one of these polymorphisms is the target variation. Identifying such causal mutations is one of the greatest challenges in genetics, today and tomorrow, both to use them in selection programs and to understand how phenotypes are built.

A method has been developed, based on the following main steps:

  • Capturing the target chromosomal region from different individuals with different genotypes
  • Sequencing this region with the latest high-throughput sequencing techniques
  • Bioinformatics analysis of the sequences to detect the polymorphisms and to select the most probable ones
  • Validating the hypothetic causal mutation

When compared to a full-genome sequencing approach, this approach has the advantage of increasing the number of individuals sequenced and analysed for one specific locus and thus, the probability of finding a causal mutation amongst a large number of polymorphisms.

The CRB GADIE has developed this expertise within the framework of the CapSeqAn program involving three genomic regions of interest (the polled gene in cattle, the Major Histocompatibility Complex in hen, and a gene for melanoma susceptibility in pig). This has been done in partnership with three of GABI's research teams: G2B (Bovine Genetics and Genomics), PSGen (Populations, Statistics and Genome) and GIS (Genetics, Immunity, Health), and with the National Genotyping Center. GABI has also led other programs on sequence capture, for example, for studies on the Generalised Caprine-like Hypoplasia syndrome (GCHS) and on the identification of QTL for osteochondrosis susceptibility.

The sequence capture step is done using Nimblegen arrays, specifically designed for each study. The oligonucleotides covering the whole region to be studied are placed on the arrays, after excluding the repeated sequences in order to avoid non-specific capture. The sequences captured for a given number of different individuals are then sequenced with a high-throughput sequencing method. Sequence assembly is generally performed by comparing it to an existing known genome and all polymorphisms are noted. Two main approaches are then used to identify the causal mutation or at least, in a first step to eliminate those that are not causal mutations. The first method consists in excluding all polymorphisms that are not compatible with the animals' status. For example, it is possible to eliminate homozygous SNP in animals that are heterozygous for a trait or vice versa. The second approach is an in silico search for the effects of the genetic variants based on the different notations of the analysed sequence: exons, introns, promoters, regulating regions, transcription sites, etc... The aim is to decrease sufficiently the number of candidate mutations to limit the genotyping work needed to validate a large population, and then, after a new phase of elimination, the work needed to validate the functional validation.