{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# Notebook refact of Process.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"background-color:#B0E0E6; color:#000000;padding:5px\">\n",
    "This Notebook is used to process fastq files, turning them into exploitable wig files.\n",
    "<br><br>\n",
    "Only Python3 is available here.\n",
    "<br><br>\n",
    "An automatic back up it set up every 10 minutes, using crontab and git.\n",
    "<br><br>\n",
    "A \"source\" file is written to pass environnement variable to each bash magic cells.\n",
    "<br><br>\n",
    "If you want to make any changes, please comment the corresponding line, or convert the cell in RAW format using the menu.\n",
    "<br><br>\n",
    "You can also duplicate the Notebook if you need heavy changes. \n",
    "<br><br>\n",
    "Please don't touch the 0-Raw/ directory files.\n",
    "</p>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"background-color:#FFAAAA; color:#000000;padding:5px\">\n",
    "Copy the content of this template cell to make your comments more obvious and clear in the notebook. \n",
    "</p>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 164,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Version of Cutadapt \t : 1.9.1\n",
      "Version of Bowtie \t : 1.1.2\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "\n",
    "# Binaries are installed with Anaconda's package manager (Continuum Analytics)\n",
    "# Print version of binaries\n",
    "echo\n",
    "echo -e \"Version of Cutadapt \\t : $(cutadapt --version)\"\n",
    "echo -e \"Version of Bowtie \\t : $(bowtie --version | head -n 1 | tail -c 6)\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## User manual\n",
    "\n",
    "* Use ctrl+enter to run the focused cell\n",
    "* Double click a cell to change it's content\n",
    "* Use cell->run all to run all cells\n",
    "* Use the \"+\" button to add a cell\n",
    "* If you need to comment a cell, use drop menu to convert it to Raw NB Convert cell\n",
    "* Use the scissor button to delete a cell \n",
    "* Don't forget to save your work\n",
    "* Use this notebook by simply cut/paste the filename of the file you need to process into the fname variable in the cell below\n",
    "* Availables files are listed under the first Python cell\n",
    "* Depending on the size, the notebook will take a long time to process it\n",
    "* Use the magic \"%%bash\" and newline to start a bash interpreted cell\n",
    "* If you get an \"notebook has changed\" message, use the RELOAD button, I'm probably working on it !"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import necessary libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 165,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Available files :\n",
      "\n",
      "Undetermined_lane7_pair1\n",
      "flowcell261_lane8_pair1_CAGATC \t\t Processsed\n",
      "flowcell261_lane8_pair1_TGACCA \t\t Processsed\n",
      "flowcell362_lane4_pair1_ACAGTG \t\t Processsed\n",
      "flowcell362_lane4_pair1_ACTTGA \t\t Processsed\n",
      "flowcell362_lane4_pair1_CAGATC \t\t Processsed\n",
      "flowcell362_lane4_pair1_TGACCA \t\t Processsed\n",
      "flowcell362_lane4_pair1_Undetermined\n",
      "flowcell384_lane7_pair1_ACAGTG \t\t Processsed\n",
      "flowcell384_lane7_pair1_ACTTGA \t\t Processsed\n",
      "flowcell384_lane7_pair1_CAGATC \t\t Processsed\n",
      "flowcell384_lane7_pair1_GATCAG \t\t Processsed\n",
      "flowcell384_lane7_pair1_TGACCA\n",
      "testing\n"
     ]
    }
   ],
   "source": [
    "# This lib is needed to parse fastq easily\n",
    "from Bio import SeqIO\n",
    "\n",
    "# This lib is used to avoid using a bash magic cell to copy a file\n",
    "from shutil import copyfile\n",
    "\n",
    "# Used to... list directory\n",
    "from os import listdir, path\n",
    "\n",
    "# Check the time taken by a function to run\n",
    "import datetime\n",
    "\n",
    "# Enter/Uncomment here the name of the file you want to process\n",
    "\n",
    "# fname = \"Undetermined_lane7_pair1\"\n",
    "# fname = \"flowcell261_lane8_pair1_CAGATC\"\n",
    "# fname = \"flowcell261_lane8_pair1_TGACCA\"\n",
    "# fname = \"flowcell362_lane4_pair1_ACAGTG\"\n",
    "# fname = \"flowcell362_lane4_pair1_ACTTGA\"\n",
    "# fname = \"flowcell362_lane4_pair1_CAGATC\"\n",
    "# fname = \"flowcell362_lane4_pair1_TGACCA\"\n",
    "# fname = \"flowcell362_lane4_pair1_Undetermined\"\n",
    "# fname = \"flowcell384_lane7_pair1_ACAGTG\"\n",
    "# fname = \"flowcell384_lane7_pair1_ACTTGA\"\n",
    "# fname = \"flowcell384_lane7_pair1_CAGATC\"\n",
    "# fname = \"flowcell384_lane7_pair1_GATCAG\"\n",
    "fname = \"flowcell384_lane7_pair1_TGACCA\"\n",
    "\n",
    "# Print available files in 0-Raws/ directory\n",
    "\n",
    "print(\"\\nAvailable files :\\n\")\n",
    "\n",
    "fn = {n.split('.')[0] for n in listdir(\"0-Raws/\")} # Basename\n",
    "wn = {n               for n in listdir(\"7-WIGs/\")} # P and M files\n",
    "\n",
    "lst = (n+\" \\t\\t Processsed\"                 \\\n",
    "       if (n+\".P.wig\" and n+\".M.wig\") in wn \\\n",
    "       else n                               \\\n",
    "       for n in sorted(fn))\n",
    "\n",
    "for l in lst:\n",
    "    print(l)\n",
    "            "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Build working tree\n",
    "\n",
    "Necessary step if you run this notebook on a local laptop or for the first time. If so, please convert cell in code."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "%%bash\n",
    "mkdir -p 0-Raws\n",
    "mkdir -p 1-Cutadapted\n",
    "mkdir -p 2-Gunzipped\n",
    "mkdir -p 3-Filtered\n",
    "mkdir -p 4-Bowtied\n",
    "mkdir -p 5-ncRNA-Removed\n",
    "mkdir -p 6-Genome-Aligned\n",
    "mkdir -p 7-WIGs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set up environnement\n",
    "\n",
    "* Create a source file to share the FILENAME bash environnement variable (bash cells don't keep scope)\n",
    "* Copy the file we're working on to root of Bioinf dir to work safer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 166,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Build export string used by bash in the \"source\" file\n",
    "export_line = \"export FILENAME=\"+fname\n",
    "\n",
    "# Write a local file (\"source\") to share env. variable between cells\n",
    "with open(\"source\",\"w\") as f:\n",
    "    f.write(export_line)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Copy to root dir"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 167,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'flowcell384_lane7_pair1_TGACCA.fastq.gz'"
      ]
     },
     "execution_count": 167,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Copy file from 0-Raws to root dir of notebook to work safer\n",
    "copyfile(\"0-Raws/\"+fname+\".fastq.gz\", fname+\".fastq.gz\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Check\n",
    "\n",
    "* File exists\n",
    "* Is not empty\n",
    "* Env. var. is set up\n",
    "* Print first entry of the file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 168,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "flowcell384_lane7_pair1_TGACCA\n",
      "\n",
      "@SN279:498:C88PKACXX:7:1109:1699:2220 1:N:0:TGACCA\n",
      "NGAGGTGCACAATCGACCGATCCTGCTGTAGGCACCATCAATAGATCGGAA\n",
      "+\n",
      "#1=D;DDDHHFHHIIIIIIIIIIIIIIIIHIIIIFHIIIIIHHIIIIIIII\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "source ./source\n",
    "echo\n",
    "echo $FILENAME\n",
    "echo\n",
    "zmore $FILENAME.fastq.gz | head -n 4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Estimate Cutadapt processing time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 169,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing will take approximatively 0 hour(s) 44 minute(s) and 38 second(s)\n"
     ]
    }
   ],
   "source": [
    "# Get file size in bytes\n",
    "fsize = path.getsize(fname+\".fastq.gz\")\n",
    "\n",
    "# Takes more or less 1 hour to process 2.45Gib.\n",
    "bytes_per_second = (2.45*pow(10,9))/3600\n",
    "\n",
    "# How many seconds for this file\n",
    "seconds = fsize/bytes_per_second\n",
    "\n",
    "# Estimated time\n",
    "h,r = divmod(seconds,3600)\n",
    "m,s = divmod(r,60)\n",
    "\n",
    "print(\"Processing will take approximatively {0:.0f} hour(s) {1:.0f} minute(s) and {2:.0f} second(s)\"\\\n",
    "      .format(h,m,s))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br>\n",
    "<div style=\"text-align:center;font-size:300%\">Cutadapt</div>  \n",
    "<div style=\"text-align:center\">Call cutadapt binary file to strip strands</div>  \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "| Option    | Effect                 |\n",
    "|    -:     |   :-                   |\n",
    "|-a ADAPTER | Enter adapter sequence |\n",
    "|-o OUTPUT  | Indicate output file   |\n",
    "|--quiet    | No long report         |\n",
    "|INPUT      | Enter input file       |\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 170,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Store current time\n",
    "before = datetime.datetime.now()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 171,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This is cutadapt 1.9.1 with Python 3.5.1\n",
      "Command line parameters: -a CTGTAGGCACCATCAATAGATCGGAA -o 1-Cutadapted/flowcell384_lane7_pair1_TGACCA.fastq.gz --quiet flowcell384_lane7_pair1_TGACCA.fastq.gz\n",
      "Trimming 1 adapter with at most 10.0% errors in single-end mode ...\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "source ./source\n",
    "\n",
    "# Export adapter we want to cut\n",
    "export ADAPTER=CTGTAGGCACCATCAATAGATCGGAA\n",
    "\n",
    "# Run binary\n",
    "cutadapt                               \\\n",
    "    -a $ADAPTER                        \\\n",
    "    -o 1-Cutadapted/$FILENAME.fastq.gz \\\n",
    "    --quiet                            \\\n",
    "    $FILENAME.fastq.gz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 172,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cutadapt run time : 0:38:47.798753\n"
     ]
    }
   ],
   "source": [
    "# Store current time\n",
    "after = datetime.datetime.now()\n",
    "\n",
    "# Difference\n",
    "delta = after-before\n",
    "\n",
    "print(\"Cutadapt run time : {0}\".format(delta))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 173,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "source ./source\n",
    "\n",
    "rm $FILENAME.fastq.gz"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Unzip resulting compressed fastq"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 174,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Store current time\n",
    "before = datetime.datetime.now()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 175,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "source ./source\n",
    "\n",
    "# Unzip compressed file and output it in mid-process directory\n",
    "zcat 1-Cutadapted/$FILENAME.fastq.gz > 2-Gunzipped/$FILENAME.fastq"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 176,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Zcat run time : 0:00:40.103272\n"
     ]
    }
   ],
   "source": [
    "# Store current time\n",
    "after = datetime.datetime.now()\n",
    "\n",
    "# Difference\n",
    "delta = after-before\n",
    "\n",
    "print(\"Zcat run time : {0}\".format(delta))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Remove sequences with less than needed nucleotides\n",
    "\n",
    "Bowtie drops them anyway...\n",
    "\n",
    "This step takes an eternity to filter for big files..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 177,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Store current time\n",
    "before = datetime.datetime.now()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 178,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Enter the minimum needed sequence length\n",
    "min_seq_length = 4\n",
    "\n",
    "with open(\"2-Gunzipped/\"+fname+\".fastq\", \"r\") as handle, \\\n",
    "     open(\"3-Filtered/\" +fname+\".fastq\",\"w\")  as output:\n",
    "\n",
    "    # Iterate through each records in fastq file\n",
    "    seq_iter = SeqIO.parse(handle,\"fastq\")\n",
    "\n",
    "    # Generator over sequences's length greater than min_seq_length\n",
    "    filt_iter = (rec for rec in seq_iter \\\n",
    "                 if (len(rec.seq) > min_seq_length))\n",
    "\n",
    "    # Write back filtered fastq\n",
    "    SeqIO.write(filt_iter,output,\"fastq\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Filtering run time : 0:35:14.961783\n"
     ]
    }
   ],
   "source": [
    "# Store current time\n",
    "after = datetime.datetime.now()\n",
    "\n",
    "# Difference\n",
    "delta = after-before\n",
    "\n",
    "print(\"Filtering run time : {0}\".format(delta))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br>\n",
    "<div style=\"text-align:center;font-size:300%\">Bowtie</div>  \n",
    "<br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "| Option    | Effect                 |\n",
    "|    -:     |   :-                   |\n",
    "| -S        | Output as a SAM file   |\n",
    "| -v 3      | report end-to-end hits w/ <=v mismatches; ignore qualities |\n",
    "| -p 4      | Use 4 processors       |\n",
    "| --best    | hits guaranteed best stratum; ties broken by quality |\n",
    "| INPUT     | Input file             |\n",
    "| OUTPUT    | Output file            |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "source ./source\n",
    "\n",
    "bowtie                                                   \\\n",
    "    -S                                                   \\\n",
    "    -v 3                                                 \\\n",
    "    -p 8                                                 \\\n",
    "    --time                                               \\\n",
    "    --best                                               \\\n",
    "    ref/2-Indexes/Yeast-Noncoding/Yeast-Noncoding        \\\n",
    "    3-Filtered/$FILENAME.fastq                           \\\n",
    "    4-Bowtied/$FILENAME.sam\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Remove non-codant RNA\n",
    "\n",
    "Original script burns 416 lines (0-415). Doing so strip the first non-header entry. Is it right ? Here I strip 415 exactly.\n",
    "\n",
    "This block creates a generator that filters records according to the corresponding sam file with '4' in the field used in the original script. This method takes the bet that the first non-header line in the sam file matches the first line of the fastq. This is ugly. Don't do this. Need refact with browsing the sam file correctly.\n",
    "\n",
    "The '4' in the flag field of the SAM file means that the read has no reported alignment. In this case, every aligned read means a match with non-codant tRNA index. So we keep only \"mismatches\" as they represent all reads that don't match with non-codant, in other words the reads we are interested in.\n",
    "\n",
    "> Sum of all applicable flags. Flags relevant to Bowtie are:\n",
    "\n",
    "> * 1 The read is one of a pair\n",
    "> * 2 The alignment is one end of a proper paired-end alignment\n",
    "> * 4 The read has no reported alignments\n",
    "> * 8 The read is one of a pair and has no reported alignments\n",
    "> * 16 The alignment is to the reverse reference strand\n",
    "> * 32 The other mate in the paired-end alignment is aligned to the reverse reference strand\n",
    "> * 64 The read is the first (#1) mate in a pair\n",
    "> * 128 The read is the second (#2) mate in a pair\n",
    "\n",
    "> Thus, an unpaired read that aligns to the reverse reference strand will have flag 16. \n",
    "\n",
    "> A paired-end read that aligns and is the first mate in the pair will have flag 83 (= 64 + 16 + 2 + 1).\n",
    "\n",
    "(From Bowtie manual, http://bowtie-bio.sourceforge.net/manual.shtml#sam-bowtie-output)\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Store current time\n",
    "before = datetime.datetime.now()\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "with open(\"3-Filtered/\"     +fname+\".fastq\",\"r\") as filtered, \\\n",
    "     open(\"4-Bowtied/\"      +fname+\".sam\",\"r\")   as matches,  \\\n",
    "     open(\"5-ncRNA-Removed/\"+fname+\".fastq\",\"w\") as substracted: \n",
    "    \n",
    "    # Iterator over fastq file\n",
    "    filt_iter = SeqIO.parse(filtered,\"fastq\")\n",
    "    \n",
    "    # Strip header (as in original script)\n",
    "    for _ in range(415-3): matches.readline()\n",
    "\n",
    "    # Check last lines\n",
    "    print(matches.readline())\n",
    "    print(matches.readline())\n",
    "    print(matches.readline())\n",
    "        \n",
    "    # Generator over fastq where the corresponding sam field is 4,\n",
    "    # meaning no reported alignment\n",
    "    sub_iter = (rec for rec in filt_iter \\\n",
    "                if matches.readline().split('\\t')[1] == '4')\n",
    "\n",
    "    # Write back fastq\n",
    "    SeqIO.write(sub_iter,substracted,\"fastq\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Store current time\n",
    "after = datetime.datetime.now()\n",
    "\n",
    "# Difference\n",
    "delta = after-before\n",
    "\n",
    "print(\"Filtering non-codant tRNA run time : {0}\".format(delta))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Map Genome\n",
    "\n",
    "Invoke bowtie to map entire Cerevisiae genome. Same options as first run."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "source ./source\n",
    "\n",
    "bowtie                                                   \\\n",
    "    -S                                                   \\\n",
    "    -v 2                                                 \\\n",
    "    -p 8                                                 \\\n",
    "    --time                                               \\\n",
    "    --best                                               \\\n",
    "    ref/2-Indexes/SCerevisiae/s_cerevisiae               \\\n",
    "    5-ncRNA-Removed/$FILENAME.fastq                      \\\n",
    "    6-Genome-Aligned/$FILENAME.sam"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define Lengths Dictionnary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Those are Chromosome lengths according to original script\n",
    "\n",
    "LengthDict = {\n",
    "    'chrI'   :230218,  'chrII'  :813184, 'chrIII':316620,  \\\n",
    "    'chrIV'  :1531933, 'chrV'   :576874, 'chrVI':270161,   \\\n",
    "    'chrVII' :1090940, 'chrVIII':562643, 'chrIX':439888,   \\\n",
    "    'chrX'   :745751,  'chrXI'  :666816, 'chrXII':1078177, \\\n",
    "    'chrXIII':924431,  'chrXIV' :784333, 'chrXV':1091291,  \\\n",
    "    'chrXVI' :948066,  'chrM'   :85779                     \\\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Write WIGs\n",
    "\n",
    "* This an ugly copy/paste of original script.\n",
    "* Script build huge lists... Bêêêh.\n",
    "* Definitely need refact with generators."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "PlusDict   = {}\n",
    "MinusDict  = {}\n",
    "\n",
    "Count = 0\n",
    "\n",
    "for IdxC in LengthDict:\n",
    "    PlusDict[IdxC]  = [0]*LengthDict[IdxC]\n",
    "    MinusDict[IdxC] = [0]*LengthDict[IdxC]\n",
    "    \n",
    "with open(\"6-Genome-Aligned/\"+fname+\".sam\",\"r\") as file:\n",
    "    for line in file:\n",
    "        if line[0] != '@':\n",
    "            Split = line.split('\\t')\n",
    "\n",
    "            if Split[1] == '16':\n",
    "                MinusDict[Split[2]][int(Split[3])] += 1\n",
    "                Count += 1\n",
    "            elif Split[1] == '0':\n",
    "                PlusDict[Split[2]][int(Split[3])] += 1\n",
    "                Count += 1\n",
    "            else:\n",
    "                pass\n",
    "            \n",
    "            "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Next step makes the WIGs\n",
    "\n",
    "* This an ugly copy/paste of original script.\n",
    "* Same problem"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "with open(\"7-WIGs/\"+fname+\".P.wig\",\"w\") as P, \\\n",
    "     open(\"7-WIGs/\"+fname+\".M.wig\",\"w\") as M:\n",
    "    \n",
    "    P.write('track name=tracklabel totalCounts=' \\\n",
    "            + str(Count)                         \\\n",
    "            + ' viewLimits=-5:5 color=79,159,36\\n')\n",
    "    \n",
    "    M.write('track name=tracklabel totalCounts=' \\\n",
    "            + str(Count)                         \\\n",
    "            + ' viewLimits=-5:5 color=79,159,36\\n')\n",
    "\n",
    "    for IdxC in PlusDict:\n",
    "\n",
    "        P.write('fixedStep  chrom=' + IdxC + '  start=1  step=1\\n')\n",
    "        M.write('fixedStep  chrom=' + IdxC + '  start=1  step=1\\n')\n",
    "\n",
    "        for IdxN in range(0, len(PlusDict[IdxC])):\n",
    "            P.write(str(PlusDict[IdxC][IdxN])  + '\\n')\n",
    "            M.write(str(MinusDict[IdxC][IdxN]) + '\\n')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# WIGs files are now available under 7-WIGs/ directory"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}