<divstyle="text-align:center">Call cutadapt binary file to strip strands</div>
%% Cell type:markdown id: tags:
| Option | Effect |
| -: | :- |
|-a ADAPTER | Enter adapter sequence |
|-o OUTPUT | Indicate output file |
|--quiet | No long report |
|INPUT | Enter input file |
%% Cell type:code id: tags:
``` python
# Store current time
before=datetime.datetime.now()
```
%% Cell type:code id: tags:
``` python
%%bash
source./source
# Export adapter we want to cut
exportADAPTER=CTGTAGGCACCATCAATAGATCGGAA
# Run binary
cutadapt \
-a$ADAPTER \
-o1-Cutadapted/$FILENAME.fastq.gz \
--quiet \
$FILENAME.fastq.gz
```
%% Output
This is cutadapt 1.9.1 with Python 3.5.1
Command line parameters: -a CTGTAGGCACCATCAATAGATCGGAA -o 1-Cutadapted/flowcell362_lane4_pair1_Undetermined.fastq.gz --quiet flowcell362_lane4_pair1_Undetermined.fastq.gz
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
%% Cell type:code id: tags:
``` python
# Store current time
after=datetime.datetime.now()
# Difference
delta=after-before
print("Cutadapt run time : {0}".format(delta))
```
%% Output
Cutadapt run time : 0:03:24.043250
%% Cell type:code id: tags:
``` python
%%bash
source./source
rm$FILENAME.fastq.gz
```
%% Cell type:markdown id: tags:
## Unzip resulting compressed fastq
%% Cell type:code id: tags:
``` python
# Store current time
before=datetime.datetime.now()
```
%% Cell type:code id: tags:
``` python
%%bash
source./source
# Unzip compressed file and output it in mid-process directory
# reads with at least one reported alignment: 512682 (16.47%)
# reads that failed to align: 2599991 (83.53%)
Reported 512682 alignments to 1 output stream(s)
Time searching: 00:00:34
Overall time: 00:00:34
%% Cell type:markdown id: tags:
## Remove non-codant RNA
Original script burns 416 lines (0-415). Doing so strip the first non-header entry. Is it right ? Here I strip 415 exactly.
This block creates a generator that filters records according to the corresponding sam file with '4' in the field used in the original script. This method takes the bet that the first non-header line in the sam file matches the first line of the fastq. This is ugly. Don't do this. Need refact with browsing the sam file correctly.
The '4' in the flag field of the SAM file means that the read has no reported alignment. In this case, every aligned read means a match with non-codant tRNA index. So we keep only "mismatches" as they represent all reads that don't match with non-codant, in other words the reads we are interested in.
> Sum of all applicable flags. Flags relevant to Bowtie are:
> * 1 The read is one of a pair
> * 2 The alignment is one end of a proper paired-end alignment
> * 4 The read has no reported alignments
> * 8 The read is one of a pair and has no reported alignments
> * 16 The alignment is to the reverse reference strand
> * 32 The other mate in the paired-end alignment is aligned to the reverse reference strand
> * 64 The read is the first (#1) mate in a pair
> * 128 The read is the second (#2) mate in a pair
> Thus, an unpaired read that aligns to the reverse reference strand will have flag 16.
> A paired-end read that aligns and is the first mate in the pair will have flag 83 (= 64 + 16 + 2 + 1).