When it comes to public access, the tree of life has holes.
A new study co-authored by University of Florida researchers shows about 70 percent of published genetic sequence comparisons are not publicly accessible, leaving researchers worldwide unable to get to critical data they may need to tackle a host a problems ranging from climate change to disease control.
Scientists are using the genetic data to construct the largest open-access tree of life as part of the National Science Foundation’s $5.6-million Assembling, Visualizing and Analyzing the Tree of Life project. Understanding organismal relationships is increasingly valuable for tracking the origin and spread of emerging diseases, creating agricultural and pharmaceutical products, studying climate change, controlling invasive species and establishing plans for conservation and ecosystem restoration.
The study appearing today in PLoS Biology describes a significant challenge for the project, which is expected to produce an initial draft tree by the end of the year. It highlights the need for developing more effective methods for storing data for long-term use and urges journals to adopt more stringent data-sharing policies.
“I think what we need is a major change in our mindset about just how important it is to deposit your data – this has to be a standard part of what we do,” said co-author Doug Soltis, a distinguished professor at the Florida Museum of Natural History on the UF campus and UF’s biology department. “Because if it’s not there, it’s lost forever. These are really, really important for long-term use, as we’re seeing now in our efforts to build a tree.”
Estimates of the amount of missing data were based on 7,539 peer-reviewed studies about animals, fungi, seed plants, bacteria and various microscopic organisms. Soltis said the missing genetic data has required project collaborators to contact hundreds of researchers to request information, or attempt to reproduce the sequence alignments and analyses, which is extremely labor intensive.
“There are ambiguities with the alignments, you have to make certain judgment calls, and so an alignment that I do is not going to be the same as an alignment that somebody else does,” said lead author Bryan Drew, a postdoctoral researcher in UF’s biology department. “It’s hard to assess a publication’s validity in a lot of cases if you don’t have access to the alignments. To me, that’s the biggest problem with all of this.”
Challenges include complicated mechanisms for uploading data and inconsistencies between journals – some require or strongly recommend data be stored in an online database and others do not, Drew said. The most widely used, publicly accessible databases include GenBank, TreeBASE and Dryad. Most journals require DNA sequences be deposited in GenBank, but comparatively few require the sequence alignments to be publicly archived. When study co-authors emailed researchers to obtain missing information, a majority did not respond, and the co-authors were rarely successful in retrieving the data.
“A lot of the authors I contacted said their data was in TreeBASE, but they were unaware of the next step needed after acceptance by the journal – the researchers didn’t know they had to go back into TreeBASE and actually make the data available to the public,” Drew said.
Elizabeth Kellogg, a professor in the department of biology at the University of Missouri-St. Louis who was not involved with the study, said she is not surprised about the large amount of missing information.
The Latest on: Genetic sequence comparisons
- A Harvard scientist is developing a DNA-based dating app to reduce genetic disease. Critics called it eugenics.on December 13, 2019 at 4:29 am
Under Church’s bio on the start-up’s website, there’s just a quotation: “That is not an outlandish idea.” Pushing back on the eugenics comparisons, Church said the foundation of his idea is in genetic ...
- Altered microRNA and target gene expression related to Tetralogy of Falloton December 13, 2019 at 2:54 am
In comparison to the normal heart, which is the subject of this study ... Reads which could not be overlapped with any known annotations were declared as unknown sequences. Small RNA-seq data has been ...
- Proteomics Market Healthy Pace throughout the Forecast during 2016-2026on December 13, 2019 at 2:20 am
Proteomics research can be enhanced by advances in mass spectrometry and protein and DNA sequence database. Proteomics requires various instruments ... North America is the largest market for ...
- BEN-solo factors partition active chromatin to ensure proper gene activation in Drosophilaon December 13, 2019 at 2:16 am
Finally, we find that adjacent gene pairs separated by an ELBA bound sequence become less differentially expressed in ELBA mutants ... the functional importance of insulators to partition ...
- Humans 'sole culprits' in US parrot extinctionon December 12, 2019 at 7:02 pm
A genetic study of the US's only native parrot appears to confirm its extinction was down to humans alone. Scientists sequenced the genome of a stuffed Carolina parakeet held in a private collection.
- Gut disease and gene analysis networkson December 11, 2019 at 2:19 pm
The current study aimed to help answer this query. The researchers set up a linear sequence of analytical tasks that generates a profile of the mRNA, microRNA and long noncoding RNA expressed in these ...
- GenSight Biologics reports findings from REALITY...on December 10, 2019 at 9:30 pm
The objective is to generate insights about the natural history of the disease based on an approach that would facilitate comparisons ... gene therapies for retinal neurodegenerative diseases and ...
- Circulating Fetal Cells Sequenced for Prenatal Testing Studyon December 10, 2019 at 11:47 am
Using whole-genome sequencing, they detected fetal genetic abnormalities such as trisomies ... “If you think about what it cost to sequence a genome or an exome 10 years ago and compare that to now, ...
- Gene expression regulation in Chinese cabbage illuminatedon December 5, 2019 at 6:36 am
H3K27me3 is often associated with gene silencing. The whole genome sequence for Chinese cabbage (Brassica rapa ... Therefore, the researchers decided to compare the presence of H3K27me3 in these ...
- Asia-wide genome mapping project reveals insights into Asian ancestry and genetic diversityon December 4, 2019 at 10:10 am
After a global genetic comparison, a team of international scientists has discovered that Asia has ... Asian people have previously accounted for only six per cent of the world's recorded genome ...
via Google News and Bing News