WCRC researchers contribute to the new methods for detecting technical errors in DNA sequencing data
Information about the sequence of nucleotides in DNA is the key to understanding genetic predisposition to disease. Genomic data for large research cohorts are typically obtained by means of DNA sequencing or microarray genotyping. Each of these methods has many technological subtypes, such as whole genome sequencing and exome sequencing. The first one is an expensive method that provides information about all 3 billion nucleotides in DNA; the second only provides information about the DNA regions that code for proteins.
DNA variation – differences from the reference DNA sequence – is at the root of innate disease risk. Searching for DNA variation is a difficult statistical task because true DNA variants are hidden in the noise that inevitably occurs during the sequencing process.
In a paper published in Genome Research, Alexander Loboda of the Population Genetics Research Laboratory at the WCRC for Personalized Medicine, together with scientists from the world's leading genomic institutes, analysed allelic frequencies for the same DNA variations obtained by different DNA sequencing methods. The paper shows that there is a systematic error in the estimation of allelic frequencies between different sequencing platforms. However, using data from the world's largest genomic resource, genomAD, and the American biobank AllofUS, the researchers were able to identify specific regions of DNA that would have low sequencing reliability. They were also able to develop an algorithm to predict the quality and reliability of the data obtained, which can be used to improve the process of filtering data quality and avoiding false-positive associations in genomic studies.
To learn more, please read the article here