Bowtie2 on Perceval

Following up on my previous post about using HISAT/StringTie on Perceval, I thought it would be useful to include a post about a simple Bowtie2 alignment, such as one for ChIP-seq.

Modules required

Perceval requires that specific software modules be loaded in the user’s environment prior to submitting jobs to the cluster.  For this pipeline, you’ll need bowtie2, samtools, and java.

module load bowtie2
module load samtools
module load java

(or)

module load bowtie2 samtools jav

SLURM Arrays

When running many parallel alignments it’s convenient to take advantage of SLURM’s array capability.  Before starting, make sure your input fastq files are named in such a way that the names are identical except for an index number, from 1 to the number of samples.  In this case, all my files are named AE1.fastq.gz through AE6.fastq.gz

Steps

  1. Prepare input fastq files.  I like to put all this in one subdirectory named “fastq”.  If multiple lanes were used, you can merely concatenate the gzipped files like this:
    cat AE1_AACCAG.R1.L001.fastq.gz AE1_AACCAG.R1.L002.fastq.gz > AE1.fastq.gz
  2. Edit a script to run the bowtie2 command.  Save this into a file named something like “arrayBowtie2.sh” Here’s my example:
    #!/bin/bash
    #SBATCH -J bt2 #jobname (optional)
    #SBATCH -n 1 #nodes
    #SBATCH -c 8 #cpu threads per node
    #SBATCH -t 3:59:00 #estimated run time
    #SBATCH -p main #partition name
    #SBATCH --export=ALL
    #SBATCH -o /home/user/data/output/bt2-%A-%a.out #where to put output if any
    #SBATCH -e /home/user/data/output/bt2-%A-%a.err #error output
    #SBATCH --array=1-6 #I have 6 samples
    #SBATCH --mail-type=END,FAIL  #send an email on end or if failure
    #SBATCH --mail-user=me@a.com #where to send email
    
    #align using default parameters with unpaired reads
    bowtie2 -p 8 -x /home/user/genomes/bowtie2/mm10 \
    -U /home/user/data/fastq/AE"${SLURM_ARRAY_TASK_ID}".fastq.gz \
    -S /home/user/data/results/AE"${SLURM_ARRAY_TASK_ID}".sam
    
    #note - I split the command with continue symbols ("\") but I normally don't
    #do this in my scripts
  3. Submit the sbatch command like this:
    sbatch arrayBowtie2.sh
  4. You can check progress with:
    squeue -u user
  5. Next you’ll want to run samtools to convert sam to bam, sort, and index the files.  Create a script file named something like “arraySamtools.sh” Submit using sbatch as before.  Here’s an example:
    #SBATCH -J samtools
    #SBATCH -n 1
    #SBATCH -t 3:59:00
    #SBATCH -p main
    #SBATCH --export=ALL
    #SBATCH -o /home/user/data/output/samtools-%A-%a.out
    #SBATCH -e /home/user/data/output/samtools-%A-%a.err
    #SBATCH --array=1-6 
    #SBATCH --mail-type=END,FAIL  
    #SBATCH --mail-user=me@a.com 
    
    #convert sam to bam
    samtools view -b -S /home/user/data/results/AE"${SLURM_ARRAY_TASK_ID}".sam > \
    /home/user/data/results/AE"${SLURM_ARRAY_TASK_ID}".bam
    
    #remove unneeded sam
    rm /home/user/data/results/AE"${SLURM_ARRAY_TASK_ID}".sam
    
    #sort using scratch disk space (faster) for work files
    samtools sort -T /scratch/user/xxx"${SLURM_ARRAY_TASK_ID}" \
    -o /home/user/data/results/AE"${SLURM_ARRAY_TASK_ID}".sorted.bam \
    /home/user/data/results/AE"${SLURM_ARRAY_TASK_ID}".bam
    
    #remove unneeded pre-sort bam files
    rm /home/user/data/results/AE"${SLURM_ARRAY_TASK_ID}".bam
    
    #index the new sorted files
    samtools index /home/user/data/results/AE"${SLURM_ARRAY_TASK_ID}".sorted.bam
  6. Now you’ll have sorted and indexed bam files.  For most applications, it’s helpful to remove PCR duplicates.  Fire up your editor and create a script file for running Picard.  Submit using sbatch as before.  Here’s an example:
    #!/bin/bash
    #SBATCH -J picard
    #SBATCH -n 1
    #SBATCH -t 3:59:00
    #SBATCH -p main
    #SBATCH --export=ALL
    #SBATCH -o /home/user/data/output/picard-%A-%a.out
    #SBATCH -e /home/user/data/output/picard-%A-%a.err
    #SBATCH --array=1-6 
    #SBATCH --mail-type=END,FAIL  
    #SBATCH --mail-user=me@a.com 
    
    #use java to run Picard's MarkDuplicates function
    java -Xmx4g -jar /home/user/bin/picard.jar MarkDuplicates \
    INPUT=/home/user/data/results/AE"${SLURM_ARRAY_TASK_ID}".sorted.bam \
    OUTPUT=/home/user/data/results/AE"${SLURM_ARRAY_TASK_ID}".uniq.bam \
    METRICS_FILE=/home/user/data/results/AE"${SLURM_ARRAY_TASK_ID}".metrics \
    REMOVE_DUPLICATES=true ASSUME_SORTED=true
    
    #index
    samtools index /home/user/data/results/AE"${SLURM_ARRAY_TASK_ID}".uniq.bam
  7. Now you’ll have deduplicated (“uniq”) versions and bai indexes.  I usually retain the original sorted bams as well as the deduplicated ones just in case.  Now you’re ready to start peak finding.

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s