This lesson is still being designed and assembled (Pre-Alpha version)

Introduction to Workflows with Common Workflow Language: Setup

Software Setup

These lessons assume that you are using the freely available Visual Studio Code application with the Benten extension along with the CWL reference runner (cwltool).

This tutorial requires three pieces of software to run and visualize the workflows: Docker, cwltool, and graphviz.

Please follow instructions for your OS by clicking on the relevant tab below.

  1. Download and install VSCode.

  2. Open Benten in the marketplace and click the Install button or follow the directions.

  3. Install and configure Windows Subsystem for Linux 2 (WSL2), and Docker Desktop
    1. Confirm that you are running Windows 10, version 2004 or higher (Build 19041 and higher) or Windows 11.

      Check your Windows version

      To check your Windows version and build number, press the Windows logo key + R, type winver, select OK. You can update to the latest Windows version by selecting “Start” > “Settings” > “Windows Update” > “Check for updates”.

    2. Open PowerShell as Administrator (“Start menu” > “PowerShell” > right-click > “Run as Administrator”) and paste the following command followed by Enter to install WSL 2:

      wsl --install

    3. Reboot your computer. Ubuntu will set itself up after the reboot. Wait for Ubuntu to ask for a UNIX username and password. After you provide that information and the command prompt appears, then the Ubuntu window can be closed.
    4. Then continue to download Docker Desktop and run the installer.
      1. Reboot after installing Docker Desktop.
      2. Run Docker Desktop
      3. Accept the terms and conditions, if prompted
      4. Wait for Docker Desktop to finish starting
      5. Skip the tutorial, if prompted
      6. From the top menu choose “Settings” > “Resources” > “WSL Integration”
      7. Under “Enable integration with additional distros” select “Ubuntu”
      8. Close the Docker Desktop window
  4. Configure VS Code
    1. Open this link to install the “Remote - WSL” extension for VS Code by clicking the Install button or by following the directions.
    2. After installation, in VS Code choose “Open a Remote - WSL Window” and then “New WSL Window”.

      If you don’t see those option, then press Ctrl+Shift+P and then type “WSL” and they should appear at the top of the screen.

    3. There should now be a second VS Code window that has “WSL: Ubuntu” in green at the lower left corner. You can close the original VS Code window.
    4. To enable the Benten CWL extension in this “WSL : Ubuntu” window:
      • Press Ctrl+Shift+X to open the “Extensions” pane.
      • Look for “CWL (Rabix/Benten)” and click the blue “Install in WSL: Ubuntu” button.
  5. Open a terminal and install tutorial prerequisites
    • Choose “Terminal” > “New Terminal” from the menu in the “WSL : Ubuntu” VS Code window.
    • Copy the following sudo apt-get update && sudo apt-get install -y python3-venv wget graphviz
    • Paste it into the terminal window
    • Press Return to run it. You will need to use the UNIX password you set earlier.

    What is the “terminal”?

    All references to a “terminal” for the rest of this tutorial are to this terminal window inside the “WSL : Ubuntu” Visual Studio Code window, and not Powershell, the Windows Command Prompt, nor the “Ubuntu” app.

  6. Install the latest version of cwltool.
    1. First we will make a Python virtual environment by running the following commands in the terminal.
      python3 -m venv env       # Create a virtual environment named 'env' in the current directory
      source env/bin/activate   # Activate the 'env' environment
      

      You will know that this worked as the terminal prompt will now have (env) at the beginning.

      Reactivating the python virtual environment

      Every time you launch VS Code or launch a new terminal, you must run source env/bin/activate to re-enable access to this Python Virtual Environment.

    2. Next, install cwltool by running the following in the terminal:
      pip install cwltool
      
  1. Download and install VSCode.
  2. Open Benten in the marketplace and click the Install button or follow the directions.
  3. Install docker
  4. Enable docker usage as a non-root user
  5. Install the latest version of cwltool.
    1. First we will make a Python virtual environment by running the following commands in the terminal.
      python3 -m venv env       # Create a virtual environment named 'env' in the current directory
      source env/bin/activate   # Activate the 'env' environment
      

      You will know that this worked as the terminal prompt will now have (env) at the beginning.

      Reactivating the python virtual environment

      Every time you launch VS Code or launch a new terminal, you must run source env/bin/activate to re-enable access to this Python Virtual Environment.

    2. Next, install cwltool by running the following in the terminal:
      pip install cwltool
      
  6. Later we will make visualisations of our workflows. To support that we need to install graphviz. Here is the command for Debian-based Linux systems:
    sudo apt-get install -y graphviz
    

    For other Linux systems, check https://graphviz.org/download/#linux

  1. Download and install VSCode.
  2. Open Benten in the marketplace and click the Install button or follow the directions.
  3. Install docker
  4. Install miniconda
  5. Tell conda about which channels (sources) we will use
     conda config --add channels bioconda
     conda config --add channels conda-forge
    
  6. Create a virtual environment using conda
     conda create --name cwltutorial
    
  7. Activate the virtual environment
     conda activate cwltutorial
    
  8. Install cwltool and graphviz using conda
     conda install -c bioconda cwltool
     conda install -c anaconda graphviz
    

Reactivating the python virtual environment

The virtual environment needs to be activated every time you start a terminal using conda activate cwltutorial.

Confirm the software is installed correctly

To confirm docker is installed, run the following command to display the version number:

docker version

You should see something similar to the output shown below.

Client: Docker Engine - Community
 Version:           20.10.13
 API version:       1.41
 Go version:        go1.16.15
 Git commit:        a224086
 Built:             Thu Mar 10 14:08:15 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.13
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.15
  Git commit:       906f57f
  Built:            Thu Mar 10 14:06:05 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.10
  GitCommit:        2a1d4dbdb2a1030dc5b01e96fb110a9d9f150ecc
 runc:
  Version:          1.0.3
  GitCommit:        v1.0.3-0-gf46b6ba
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

To confirm cwltool is installed, run the following command to display the version number:

cwltool --version

You should see something similar to the output shown below.

/home/learner/env/bin/cwltool 3.1.20220224085855

To confirm graphviz is installed, run the following command to display the version number:

dot -V

You should see something similar to the output shown below.

dot - graphviz version 2.43.0 (0)

Files

You will need to install some example files for this lesson. In this tutorial we will use RNA sequencing data.

Setting up a practice repository

For this tutorial some existing tools are needed to build the workflow. These existing tools will be imported via GitHub. First we need to create an empty git repository for all our files. To do this, use this command:

git init novice-tutorial-exercises

Next, we need to move into our empty git repo:

cd novice-tutorial-exercises

Then import bio-cwl-tools with this command:

git submodule add https://github.com/common-workflow-library/bio-cwl-tools.git

Downloading sample and reference data

Create a new directory inside the novice-tutorial-exercises directory and download the data:

mkdir rnaseq
cd rnaseq
wget https://zenodo.org/record/4541751/files/GSM461177_1_subsampled.fastqsanger
wget https://zenodo.org/record/4541751/files/GSM461177_2_subsampled.fastqsanger
wget https://zenodo.org/record/4541751/files/GSM461180_1_subsampled.fastqsanger
wget https://zenodo.org/record/4541751/files/GSM461180_2_subsampled.fastqsanger
wget https://zenodo.org/record/4541751/files/Drosophila_melanogaster.BDGP6.87.gtf
wget https://hgdownload.soe.ucsc.edu/goldenPath/dm6/bigZips/dm6.fa.gz
gunzip dm6.fa.gz  # STAR index requires an uncompressed reference genome

Generating STAR index

To run the STAR tool, index files generated from the reference files are needed.

This is a large directory (3.3 GB): you can download the directory at https://drive.google.com/drive/folders/1twx9m5KZ96WvBoXUaeR0X3FVpuRqJ37_?usp=sharing or you can generate it yourself:

Create dm6-star-index.yaml in the the novice-tutorial-exercises directory:

InputFiles:
  - class: File
    location: rnaseq/dm6.fa
    format: http://edamontology.org/format_1929  # FASTA
IndexName: 'dm6-STAR-index'
Overhang: 36
Gtf:
  class: File
  location: rnaseq/Drosophila_melanogaster.BDGP6.87.gtf

Generate the index files with cwltool:

cwltool --outdir rnaseq/ bio-cwl-tools/STAR/STAR-Index.cwl dm6-star-index.yaml