henjin on Steve Kirsch's newsletter

12 Comments

Oct 1, 2022Edited

Did you run my BLAST search yet?

In an earlier comment thread, I pointed out that one method of nanopore sequencing used to sequence SARS 2 has an average read length of about 400 base pairs (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8493558/). And there's 4^400 or about 10^241 possible sequences of 400 base pairs, and you're not going to find a 400-bp sequence of the genome of SARS 2 in some random contaminants like cow blood.

Try to google for "sars 2 reference genome", open the first result, and copy some sequence of 400 nucleotides of the genome, like for example the first 400 nucleotides (https://www.ncbi.nlm.nih.gov/nuccore/1798174254): "tataccttcc caggtaacaa accaaccaac tttcgatctc ttgtagatct gttctctaaa cgaactttaa aatctgtgtg gctgtcactc ggctgcatgc ttagtgcact cacgcagtat aattaataac taattactgt cgttgacagg acacgagtaa ctcgtctatc ttctgcaggc tgcttacggt ttcgtccgtg ttgcagccga tcatcagcac atctaggttt cgtccgggtg tgaccgaaag gtaagatgga gagccttgtc cctggtttca acgagaaaac acacgtccaa ctcagtttgc ctgttttaca ggttcgcgac gtgctcgtac gtggctttgg agactccgtg gaggaggtct tatcagaggc acgtcaacat".

Then go to nucleotide BLAST, enter "SARS-CoV-2 (taxid:2697049)" in the "Organism" field and click the "exclude" checkbox next to it, and press the "BLAST" button: https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn. (BLAST only returns the 100 first results, so if you don't exclude SARS-CoV-2, then all of the 100 results are genomes of SARS 2.) But anyway, when SARS 2 is excluded, then the only results with a 100% identical match are things like synthetic clones of SARS 2. The next closest results include for example "Select seq MZ937003.2 Bat coronavirus isolate BANAL-20-236/Laos/2020, complete genome", which only has 98.7% similarity.

I can next write a tutorial on how you can download the data for the raw reads and align it yourself using a pipeline of tools like BWA-MEM and BCFtools, but the procedure is described in this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8285700/. And you can download files for the raw reads from the "Data availability" section at the end of this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7409862/.

Expand full comment

Reply (1)

Christine Massey FOIs

Oct 3, 2022

Irrelevant. No so-called "viral" sequences have ever been shown to come from any specific particle. All the "genomes" are assembled, none have been discovered intact. I don't care how long any of the sequences are. None have ever been shown to have anything to do with a tiny replicating, transmissible disease bomb. You are so caught up in fancy technology that you've lost touch with reality.

Expand full comment

Like (3)

Reply (1)

henjin

Oct 3, 2022Edited

Has the no virus crowd explained why genetic sequencing ends up producing different variants of viruses over time, so that at first a mutation emerges in one part of the world and then it spreads to other parts of the world?

There are different pipelines for taking the raw reads for the genome of SARS 2 and assembling them into a whole-genome sequence, but one of them is called HaVoC: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8285700/. For aligning the raw reads, HaVoC utilizes the Wuhan-Hu-1 reference genome which is supposed to have been first collected in December 2019: "In addition to increasing the quality of the reads, this step reduces the time of the following alignment process in which the reads are then aligned to a reference genome of SARS-CoV-2 isolate Wuhan-Hu-1 (Genbank accession code: NC_045512.2) provided in the ref.fa file with BWA-MEM [23] or Bowtie 2 [24]." However even though the raw reads are aligned using an old variant of SARS 2 as the reference, the pipeline still ends up producing different variants of the virus over time. So how does the software pipeline know that it should produce the genome a delta variant in late 2021, an omicron variant in early 2022, and so on? The utilities used by the pipeline are open source software that are also used to assemble human genomes, so they don't have any special code that looks up data about COVID variants from an online database or anything. For example HaVoC uses BWA for the alignment, and you can see its source code from here: https://github.com/lh3/bwa. (BWA is also used to align the raw reads of a human genome, and you can align the reads by just using the same reference genome for all human samples regardless of what race they are, and the racial composition of the reference genome doesn't affect the race of the final genome sequence produced by the pipeline. One standard human reference genome called "hg19" is about 1/3 Sub-Saharan African and 2/3 Eurasian, and you can see examples where it's used with BWA by googling for `bwa hg19`.)

However I guess you might argue that the hardware that produces the raw reads for SARS 2 is tweaked in a way that it systematically introduces new mutations into the raw reads over time so that it simulates the evolution and geographic spread of different variants. But there would need to be a vast conspiracy where different manufacturers of sequencing hardware would need to employ the same scheme, and the scheme would need to be retroactively introduced to old sequencing hardware which was manufactured before the SARS 2 epidemic. And in case you maintain that no viruses exist, then the sequencing hardware would also need to have similar logic for other viruses so that it introduces new mutations to the raw reads over time. For example on NextStrain, you can see a family tree for variants of measles where you can see that different variants are common in different parts of the world: https://nextstrain.org/measles.

Or if the appearance of new mutations in viruses is not generated at the level of the sequencing hardware which produces the raw reads, then where? If it would be generated on the level of the open-source software which converts raw reads to whole-genome sequences, then it would be easy for people to spot it by looking at the source code.

Another thing you can try is go to the end of this paper I linked earlier: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7409862/. Click on "SRR12109250" under "Data availability". Click on "SRR12109250" again, click on "Reads", and copy the nucleotide sequence of the first raw read: "GTTGTACTTC GTTCAGTTAC GTATTGCTAA GGTTAAGACT ACTCTGCCTT TGAACAGCAC CTTCATCAGA TTCAGCTTGC ATGGCATTGT TAGTAGCCTT ATTTAAGGCT CACCTCAGCT TACCTCCTCA TGTTTAAGGT AAACGATGGC TGCATTAACC ACTGTTGGTT TTACCTTTTT AGCTTCTTCC ACAATGTCTG CATTTTTAAT GTATGCATTG TCATTAGTTT TAATAACCAC CACTAAAACT ATTCACTTTA ATGAAT". Then paste the sequence to the "Enter Query Sequence" field in nucleotide BLAST: https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn. Then when you click on the BLAST button, you'll see that the best match for the sequence is a genome of SARS 2 (with a score of 261 out of 261). So if the sequence of the raw read would actually come from some other organism like cow blood, and the sequence was only manipulated to match the SARS 2 genome through the process of alignment, then why is SARS 2 the top match for the raw read before any alignment has been performed?

Expand full comment

Reply (1)

Christine Massey FOIs

Oct 3, 2022

There is no onus on the no-virus crowd explain anything, and there is no sense in discussing meaningless, made-up, strictly-in-silico sequences that are fraudulently called "genomes" other than to point out that they are meaningless, made-up, and strictly-in-silico. The variance in these meaningless, made-up sequences simply represents the inability of virologists to replicate their anti-scientific "experiments".

Expand full comment

The same kind of sequencing hardware that is used to produce raw reads for the genomes of viruses is also used to produce raw reads for human genomes. And I guess you accept that when the hardware produces a raw read for a segment of human DNA, then the read corresponds to some physical genetic sequence that is actually present within the human genome. So then what physical substance is the source of the raw reads that are generally thought to represent the genomes of viruses? If you're saying that the source is some other organism that is not a virus, then why was it the case that in the BLAST search in my previous post, the closest match for the raw read I searched was a genome of SARS 2? In the case of both human and viral genomes, the same FASTQ file format is used to store the raw reads, and the same command line utilities like BWA are used to align the raw reads.

Another thing you can try is to go again to the "Data availability" section at the end of this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7409862/. Then click on "SRR12109250", click on "SRR12109250" again, click on "FASTA/FASTQ download", and press the "FASTQ" button under "Download". Then open this online utility for assembling the raw reads: https://ngdc.cncb.ac.cn/ncov/online/tool/variation?lang=en. Press the "Choose File" button next to "Upload Single-end Sequencing file", select the file you just downloaded, and press the "Run" button. Then after a few minutes you'll get the assembled genome, and it also shows you all of the bases where the assembled genome differs from the reference genome. The default reference genome used by the online utility is NC_045512 / Wuhan-Hu-1, which is supposed to have been collected in December 2019 and submitted in January 2020 (https://www.ncbi.nlm.nih.gov/nuccore/1798174254). But yet the online utility detects that the raw reads you uploaded have the D614G mutation at position 23403, which which was first detected in March 2020 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7310631/).

If you would repeat the same assembly procedure using standard commandline utilities like BWA, they would still detect the D614G mutation. So how do the utilities know that they need to introduce that precise mutation even though the mutation is not part of the reference genome? BWA uses the same procedure to align viral genomes and human genomes, and it doesn't need to look up any data about the genomes from an online database, but apart from the raw reads, the only external data it needs is the reference genome. So if mutations like D614G are not part of the raw reads, then where does BWA find the data for which mutations it needs to introduce? To get an idea of how the alignment and further parts of the assembly pipeline work, you can search Google Images for `bwa alignment procedure`: https://www.google.com/search?q=bwa+alignment+procedure&tbm=isch.

Expand full comment

Reply (1)

Christine Massey FOIs

Oct 4, 2022

You want people to rely on nothing but sequences, a database that they are supposed to trust like a bible, and the wild, idiotic assumptions and leaps of illogic that are rampant in virology. Meanwhile, no "virus", including "SARS-COV", has ever been shown to exist IN THE PHYSICAL REALM. Not interested in your delusional technocracy. Clearly you aren't able to cite any scientific proof of a virus... like everyone else on the planet.

Expand full comment

Did you run either of my BLAST searches or did you try to use the online interface for assembling the raw reads?

Which one of the following statements do you disagree with?

1. The same kind of sequencing hardware that is used to produce raw reads of human genetic sequences is used to produce the raw reads that are alleged to represent parts of the genomes of viruses.

2. The same kind of software pipelines that are used to assemble the raw reads of human genetic sequences to a complete human genome are also used to assemble what are alleged to be the genomes of viruses.

3. The methodology that is used to sequence and assemble a human genome is valid and reproduces the actual physical genome of a human.

4. The alleged raw reads of viruses contain data for mutations like D614G which are not present in the reference genomes used by the assembly pipeline.

5. The mutations like D614G cannot be introduced to the reference genome by the command line utilities in the assembly pipeline because they do not employ any special database to look up information about alleged mutations of alleged viruses.

So then if you agree with the first three statements, then why is the procedure of sequencing and assembly not valid in the case of viruses even though it is valid in the case of humans? And if you also agree with the last two statements, then where do the mutations like D614G in the raw reads come from?

And you also didn't answer that when I took one raw read that is alleged to represent a part of the genome of SARS 2 and I searched for the sequence on BLAST, then why was a genome of SARS 2 the best match for the sequence? If the raw read actually came from some other organism, then why was that organism not the best match?

In my previous post, I told you to download the raw reads for one genome of SARS 2 that was sequenced in Morocco in 2020, but here's a direct link to the page for the raw reads: https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR12109250&display=metadata. You can see that the platform field is listed as "OXFORD NANOPORE". In order to find raw reads of other organisms which use the same sequencing platform, you can search for "oxford nanopore" at NCBI's sequence read archive: https://www.ncbi.nlm.nih.gov/sra. When I ran the search, it returned a total of 25,010 matches, out of which the oldest result was called "GSM4194746: NANOPORE-seq Nuclear; Homo sapiens; RNA-Seq": https://www.ncbi.nlm.nih.gov/sra/SRX7223897[accn]. So now try to open the page of the oldest result, click on "SRR10540175", click on "Reads", copy the sequence of the first read, and search for the sequence on nucleotide BLAST (google for "nucleotide blast", paste the sequence in the biggest text field, and press the "BLAST" button). You'll see that the best match for the sequence is a clone of a human chromosome 5. And next you can also repeat the same procedure with other raw reads from the sequence read archive. So if the sequencing hardware by Oxford Nanopore Technologies produces valid raw reads in the case of other organisms, then why are the raw reads of viruses not valid? (And we're only talking about raw reads here, so alignment and variant calling do not yet come into play.)

Expand full comment

Reply (1)

Christine Massey FOIs

Oct 4, 2022

zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz

Expand full comment

Like (1)

Reply (1)

henjin

Oct 5, 2022Edited

Can you link to articles or videos where the no-virus folks explain how the genetic sequencing of what are alleged to be viruses works? Do they offer an explanation for what is the source of novel mutations which are not part of the reference genome, or why new mutations appear to spread from one geographical region to another?

Tom Cowan's knowledge of genetics is so deep that when the authors of a scientific paper wrote that they "designed 37 pairs of nested PCRs spanning the genome on the basis of the coronavirus reference sequence", Cowan thought that "they actually looked at 37 out of the approximately 30,000 of the base pairs that are claimed to be the genome of the intact virus. They then took these 37 segments and put them into a computer program, which filled in the rest of the base pairs." (https://drtomcowan.com/blogs/blog/only-poisoned-monkey-kidney-cells-grew-the-virus, https://www.integralworld.net/visser193.html)

Jon Rappoport seems to assume that coronaviruses are real but that SARS 2 is not and that the purported genome of SARS 2 was stitched together from pieces of the genomes of other coronaviruses or other organisms (https://blog.nomorefakenews.com/2020/10/22/the-virus-that-isnt-there-genetic-sequencing-and-the-magic-trick/). However if he also supposes that other viruses are the source of the raw reads that are alleged to be derived from SARS 2, then why are the other viruses not the closest matches for the raw reads on BLAST?

Stefan Lanka says that because a made-up reference genome of SARS 2 is used as the reference during the alignment step, then raw reads which actually come from other organisms end up being aligned so that they reproduce the made-up reference genome (http://wissenschafftplus.de/uploads/article/the-end-of-corona-a-chance-for-everybody.pdf). But in that case why are the other organisms not the closest matches for the raw reads on BLAST? And why do new mutations which are not part of the reference genome arise over time, so that the same mutation is detected by various laboratories over the world which use different sequencing hardware and different software?

Lanka also says that PCR amplification introduces random base changes within the segments that are amplified as part of the sequencing, where he's correct. And one disadvantage of the long read sequencers sold by Oxford Nanopore Technologies is that they have a fairly high error rate of around 5-10%, or at least they used to have according to two sources I found from 2019 and 2021, even though I think the error rate has gone down in some of the newest sequencers (https://www.biostars.org/p/380759/, https://www.nature.com/articles/s41467-020-20340-8). But anyway, if you just sequence the same segment of DNA multiple times, then you can guess which variant at each position is likely to be correct by picking the most common variant. (Because in the same way if you took a sequence of 400 characters from a book, you made 100 copies of the sequence where in each copy you would change a random 10% of the characters to some other character, and you then lined up the sequences, then you could eliminate the errors by just picking the most common character at each position.) And the problem of random errors in the raw reads also exists when sequencing human DNA, which however doesn't mean that human DNA cannot be sequenced accurately. And even if you sequence the genome of SARS 2 so that you have a 400-nt raw read which has a 10% error rate, and you search for the raw read on BLAST, then the closest match for the raw read will still be a genome of SARS 2 with around 90% shared bases. (Because in the same way if you pick a random book from Google Books, and you pick some sequence of 400 characters within the book, and you change 40 characters to some other character, then it's still possible to devise an algorithm which goes through all books on Google Books and finds the book which was the source of the altered sequence.)

Expand full comment

Reply (1)

Christine Massey FOIs

Oct 5, 2022

Wrong. Jon Rappoport is 100% clear that no virus is real, and again you are trying to put the onus on me to prove something when it's your job to prove you claim of a virus.

No one has ever shown that the sequences used to make up the meaningless in silico "genomes" have anything whatsoever to do with any "virus" or that they come from any particular particle. This has been discussed over and over and over again.

And here is what Tom Cowan actually wrote:

"First, in the section titled “Whole Genome Sequencing,” we find that rather than having isolated the virus and sequencing the genome from end to end, that the CDC “designed 37 pairs of nested PCRs spanning the genome on the basis of the coronavirus reference sequence (GenBank accession no. NC045512).”

To me, this computer-generation step constitutes scientific fraud. Here is an equivalency: A group of researchers claim to have found a unicorn because they found a piece of a hoof, a hair from a tail, and a snippet of a horn. They then add that information into a computer and program it to re-create the unicorn, and they then claim this computer re-creation is the real unicorn. Of course, they had never actually seen a unicorn so could not possibly have examined its genetic makeup to compare their samples with the actual unicorn’s hair, hooves and horn.

The researchers claim they decided which is the real genome of SARS-CoV-2 by “consensus,” sort of like a vote. Again, different computer programs will come up with different versions of the imaginary “unicorn,” so they come together as a group and decide which is the real imaginary unicorn."

Expand full comment

Ok, apparently Cowan edited his article later, because in an old version of his article, he wrote: "First, in the section titled 'Whole Genome Sequencing,' we find that rather than having isolated the virus and sequencing the genome from end to end, they found 37 base pairs from unpurified samples using PCR probes This means they actually looked at 37 out of the approximately 30,000 of the base pairs that are claimed to be the genome of the intact virus. They then took these 37 segments and put them into a computer program, which filled in the rest of the base pairs." (http://web.archive.org/web/20201015211840/https://drtomcowan.com/only-poisoned-monkey-kidney-cells-grew-the-virus/) So I give him credit for correcting his mistake. But should you really trust someone who makes such a basic error?

In an article by Rappoport where he commented on the article by Cowan, he quoted the same text I quoted above but he didn't point out any error within the text, which is a good indication of how much he knows about genetics (https://blog.nomorefakenews.com/2020/10/19/dr-tom-cowan-explores-the-covid-virus-invented-out-of-sheer-nonsense/).

But anyway, in a variant of nanopore sequencing known as direct RNA nanopore sequencing, almost the entire length of a coronavirus can be sequenced as a single long piece so that the step of amplification by RT-PCR is not needed (https://genome.cshlp.org/content/29/9/1545.full.pdf): "Here, we used a full-length, direct RNA sequencing (DRS) approach based on nanopores to characterize viral RNAs produced in cells infected with a human coronavirus. By using DRS, we were able to map the longest (∼26-kb) contiguous read to the viral reference genome. By combining Illumina and Oxford Nanopore sequencing, we reconstructed a highly accurate consensus sequence of the human coronavirus (HCoV)-229E genome (27.3 kb). Furthermore, by using long reads that did not require an assembly step, we were able to identify, in infected cells, diverse and novel HCoV-229E sg RNAs that remain to be characterized. Also, the DRS approach, which circumvents reverse transcription and amplification of RNA, allowed us to detect methylation sites in viral RNAs."

However the method of direct RNA nanopore sequencing is not commonly used in practice to sequence SARS 2. Oxford Nanopore Technologies provides two different protocols for sequencing SARS 2, where both methods use RT-PCR to amplify segments of the viral genome, and the average amplicon length is around 1200 bp in one protocol and around 400 bp in the other protocol: https://nanoporetech.com/covid-19. The website of Oxford Nanopore Technologies says that in the protocol where the amplicon length is around 400 bp, the "shorter length may help improve coverage for RNA samples that are likely to be degraded - for example, due to freeze-thaw cycles or storage at temperatures above -80°C." So there's some ways in which using a shorter read length produces more accurate results than using a longer read length.

Expand full comment

Reply (1)