When it comes to public access, the tree of life has holes.
A new study co-authored by University of Florida researchers shows about 70 percent of published genetic sequence comparisons are not publicly accessible, leaving researchers worldwide unable to get to critical data they may need to tackle a host a problems ranging from climate change to disease control.
Scientists are using the genetic data to construct the largest open-access tree of life as part of the National Science Foundation’s $5.6-million Assembling, Visualizing and Analyzing the Tree of Life project. Understanding organismal relationships is increasingly valuable for tracking the origin and spread of emerging diseases, creating agricultural and pharmaceutical products, studying climate change, controlling invasive species and establishing plans for conservation and ecosystem restoration.
The study appearing today in PLoS Biology describes a significant challenge for the project, which is expected to produce an initial draft tree by the end of the year. It highlights the need for developing more effective methods for storing data for long-term use and urges journals to adopt more stringent data-sharing policies.
“I think what we need is a major change in our mindset about just how important it is to deposit your data – this has to be a standard part of what we do,” said co-author Doug Soltis, a distinguished professor at the Florida Museum of Natural History on the UF campus and UF’s biology department. “Because if it’s not there, it’s lost forever. These are really, really important for long-term use, as we’re seeing now in our efforts to build a tree.”
Estimates of the amount of missing data were based on 7,539 peer-reviewed studies about animals, fungi, seed plants, bacteria and various microscopic organisms. Soltis said the missing genetic data has required project collaborators to contact hundreds of researchers to request information, or attempt to reproduce the sequence alignments and analyses, which is extremely labor intensive.
“There are ambiguities with the alignments, you have to make certain judgment calls, and so an alignment that I do is not going to be the same as an alignment that somebody else does,” said lead author Bryan Drew, a postdoctoral researcher in UF’s biology department. “It’s hard to assess a publication’s validity in a lot of cases if you don’t have access to the alignments. To me, that’s the biggest problem with all of this.”
Challenges include complicated mechanisms for uploading data and inconsistencies between journals – some require or strongly recommend data be stored in an online database and others do not, Drew said. The most widely used, publicly accessible databases include GenBank, TreeBASE and Dryad. Most journals require DNA sequences be deposited in GenBank, but comparatively few require the sequence alignments to be publicly archived. When study co-authors emailed researchers to obtain missing information, a majority did not respond, and the co-authors were rarely successful in retrieving the data.
“A lot of the authors I contacted said their data was in TreeBASE, but they were unaware of the next step needed after acceptance by the journal – the researchers didn’t know they had to go back into TreeBASE and actually make the data available to the public,” Drew said.
Elizabeth Kellogg, a professor in the department of biology at the University of Missouri-St. Louis who was not involved with the study, said she is not surprised about the large amount of missing information.
The Latest on: Genetic sequence comparisons
- Comparisons of the antibody repertoires of a humanized rodent and humans by high throughput sequencingon January 24, 2020 at 4:05 am
Intra-animal and inter-animal repertoire comparisons reveal a high level of conservation in antibody ... shared between members of this species than previously reported in humans 13,32. Lower sequence ...
- Virus may have originated in bats or snakes in marketon January 23, 2020 at 5:02 pm
TOKYO A new strain of coronavirus that emerged in China may have originated in bats or snakes, according to genetic analysis of the virus that has so far killed ... “To search for (a) potential virus ...
- Studies suggest role of bats, snakes in outbreak of China viruson January 22, 2020 at 10:14 pm
A new strain of coronavirus that emerged in China may have originated in bats or snakes, according to genetic analysis of the virus that has so far killed 17 people. The theories are based on ...
- Ancient DNA from West Africa Adds to Picture of Humans’ Riseon January 22, 2020 at 8:44 pm
From a burial site in Cameroon, archaeologists recovered human genetic material dating as far back as 8,000 years. In October 2015, scientists reconstructed the genome of a 4,500-year-old man who ...
- Consumer research guides gene editing conversationson January 20, 2020 at 5:16 pm
Analogies are useful in explaining the technology, but the Center for Food Integrity has found that some common comparisons are counterproductive, she said. For example, saying the process of cutting ...
- CRISPR-mediated gene correction links the ATP7A M1311V mutations with amyotrophic lateral sclerosis pathogenesis in one individualon January 20, 2020 at 3:20 am
As the HDR frequency was increased through this process, colonies containing the normal ATP7A sequence were established. Karyotyping was conducted at Gendix, Inc (Seoul, South Korea). Hundred nanogram ...
- Tracking wildlife diversity with environmental DNAon January 17, 2020 at 12:04 pm
The process relies on DNA extraction and sequencing from discarded animal materials, such as hair, feces or skin, followed by comparison to online DNA sequence databases for species identification.
- Kill Switch for CRISPR Could Make Gene Editing Saferon January 17, 2020 at 7:04 am
And maybe that self-sabotaging bit of DNA was coming from previous viral invaders. A quick comparison of DNA sequences proved Bondy-Denomy’s intuition correct. Phage genes nestled inside the bacterial ...
- Team sequences genome of elusive giant squidon January 16, 2020 at 7:40 am
Copenhagen) Using mitochondrial DNA sequences from these samples ... the researchers say. By allowing the comparison of the giant squid with the genomes of better-known types of cephalopods, ...
- DNA Test: Here's why analysing DNA in soil might be an effective way of tracing animal specieson January 15, 2020 at 1:52 pm
This eDNA process is done by discarded animal materials like hair, feces, skin and saliva. After the extraction of the DNA, scientists sequence and compare it to online DNA sequence databases to ...
via Google News and Bing News