The scientist restored data on the SARS-CoV-2 genome

Some of the early records have disappeared from the coronavirus genome database.

Some of the early data on the SARS-CoV-2 genome was removed from the general database that scientists from around the world work with, an American specialist found out. He managed to restore records of the genetic sequences of early coronavirus samples obtained in Wuhan — and these samples, as it turned out, differ from the variants that spread after. The sequences themselves do not say anything about the origin of the virus, the researcher notes — whether it appeared naturally or in the laboratory. But it turns out that until now when studying the origin of the virus, scientists worked with an incomplete set of data, and this could affect the results.

Thanks to the genetic sequences of coronavirus samples, it is possible to find out how SARS-CoV-2 passed to humans from animals, most likely bats. And the sequences obtained in the early stages of the pandemic are the most valuable — they allow us to get as close as possible to the initial event of the spread of the virus.

Studying the data published by various research groups, Dr. Jesse Bloom from the Fred Hutchinson Cancer Center came across a study published in March 2020, which mentioned 241 samples of SARS-CoV-2 obtained by scientists from Wuhan. The study said that the genetic sequences of the samples were uploaded to the online database Sequence Read Archive, managed by the US National Library of Medicine.

However, when Bloom wanted to look at these sequences, he did not find any of them in the database.

Interested in their disappearance, he found another work that mentioned the missing sequences. Having found out that many sequences were in the Google Cloud storage, and the files with them had the same format, Bloom was able to recover 13 of the missing records. He described his experience in more detail in an article published on the bioRxiv preprint service.

“It seems likely that the sequences were deleted to hide their existence,” Bloom believes.

By combining these 13 records with the already known ones, Bloom found out that these samples were older than those obtained in 2019 on the Wuhan market — they lacked mutations characteristic of that line. Similar sequences have come across before. This indicates the existence of another, earlier line of coronavirus, which did not affect the market.

In general, the recovered sequences had more similarities with the bat coronavirus than the samples from the market.

“Perhaps our understanding of the situation in Wuhan at the early stages may be somewhat incorrect,” he believes.

The US National Institutes of Health confirmed that the data was deleted in June 2020 at the request of the researcher who originally provided it. The institution noted that it is standard practice-geneticists from all over the world to exchange information in such databases from the very beginning of the pandemic and periodically make changes to their own records.

In light of the controversy about the origin of SARS-CoV-2, this raises the question — did the author delete the recordings to hide something? However, Bloom notes that the discovered sequences do not support any of the versions.

“This study does not provide any additional strong evidence to confirm the zoonotic or laboratory nature of the virus,” he says. – Rather, it shows that there are additional sequences related to the early period of the pandemic, which was not known until now. And there are mutations in some samples that suggest that these samples are evolutionarily older than the virus from the seafood market in Wuhan.”

“This is certainly a lot of work, and it gives a lot to understand the origin of SARS-CoV-2,” says evolutionary biologist Michael Vorobey.

Other scientists were more skeptical of Bloom’s conclusions.

“If these sequences were removed to hide, then the attempt failed,” says Robert Garry, a professor of immunology at Tulane University. — These data do not provide any new knowledge about the genetic diversity of SARS-CoV-2 at the beginning of the pandemic. The reality is that small mistakes constantly accompany the exchange of scientific data.”

“The language of the work is unusual, it contains a lot of assumptions and guesses, quotes from blog posts,” adds Andrew Preston, a specialist in microbial pathogenesis from the University of Bath in the UK. – It seems that the author wanted to point out the deliberate concealment by the Chinese authorities of early data on sequences from Wuhan. However, this is a completely subjective assessment of the situation, which will be very difficult to confirm or deny.”

In general, the work confirms that various variants of the coronavirus could have been circulating in Wuhan even before the first known outbreaks of infection associated with the seafood market. Bloom and other scientists hope that the researchers who deleted the sequences from the database will explain why they did it.

