Saturday, January 24, 2015

Antitoxin/comN Sample Mixup



I was looking at the BAM files I produced from the sequencing to verify all the strain had the genotype expected of them. Most of them did, but there was anissue with the supposed “cyaA” strain and I also noticed a problem with the new antitoxin knockout strains (antx). It looks like there is expression of the antitoxin gene (HI0659) in the supposed knockouts, specifically in samples denoted antx_*_C and antx_*_E. I have taken screenshots that provide evidence of this, but I do not have access to them at the moment. If I remember to, I will edit this part and add them.

At this point, I figured that perhaps something else is knocked out in these strains. I tried a bunch of things to figure out if this was the case. I tried looking at the unmapped read to see if I could find some sort of antibiotic resistance cassette, I tried assembling a genome from the sequencing data and looking for an insert (I learned that assembling a genome using raw RNA seq data is not straightforward), and I tried looking though coverage graphs to see if I could detect some deletion. In the end, I ran a differential expression analysis on the problematic samples and noticed that HI0938 was hugely downregulated. Looking at the “antx” files in IGV demonstrates what I saw:




Here, you can see the two KW20 samples (first two tracks) and the old antitoxin sample “A” (third track) had expression of comN. But as for antx samples “C” and “E” (bottom two tracks), there is an obvious deletion. This is consistent with all samples “C” and “E” for the supposed antitoxin knockout samples.

Looking at the fastq files for these comN knockout samples, I've discovered reads for a spectinomycin cassette similar to this one:
GGAAAATTGGAGCGTTTTATGATTCCGGGGATCCGTCGACCTGCAGTTCGAAGTTCCTATTCTCTAGAAAGTATAGGAACTTCAGAGCGCTTTTGAAGCTC
(SPEC CASSETTE)
 BLASTing the first 21 letters gets me a perfect match to KW20, positions 997240 to 997260, which correspond to the beginning of comN.

In summary, there must have been some mix up at some point between the antitoxin knockout strain and the comN knockout strain. This is what I know:
- old RNA seq data for antitoxin knockout is valid (i.e. antx_*_A)
- new RNA seq data for antitoxin knockout has an issue (i.e. antx_*_C, antx_*_E): the data actually represents a comN deletion mutant where there is a spec cassette in place of the comN gene

1 comment:

Hailey said...

I've now done PCR to amplify the region surrounding the antitoxin in the strain, RR3112, which was thought to be used for the RNA seq. It doesn't seem to have the expected spec-cassette replacement of the antitoxin gene, so the mix-up probably happened when freezing the strain itself, not when preparing samples for RNAseq.