Fly-CURE: Shell Genomics, Data Wrangling, and SNP Analyses: Glossary

Key Points

Module 1 | Lesson 1 | Getting Started
  • The shell gives you the ability to work more efficiently by using keyboard commands rather than a GUI.

  • Everything you need to utilize in class is housed in our JupyterLab app.

  • Login in to CyVerse and launch our class app everyday before class.

Module 1 | Lesson 2 | Introducing the Shell
  • The shell gives you the ability to work more efficiently by using keyboard commands rather than a GUI.

  • Useful commands for navigating your file system include: ls, pwd, and cd.

  • Most commands take options (flags) which begin with a -.

  • Tab completion can reduce errors from mistyping and make work more efficient in the shell.

Module 1 | Lesson 3 | Navigating Files and Directories
  • The /, ~, and .. characters represent important navigational shortcuts.

  • Hidden files and directories start with . and can be viewed using ls -a.

  • Relative paths specify a location starting from the current location, while absolute paths specify a location from the root of the file system.

Module 1 | Lesson 4 | Working with Files and Directories
  • You can view file contents using less, cat, head or tail.

  • The commands cp, mv, and mkdir are useful for manipulating existing files and creating new directories.

  • You can view file permissions using ls -l and change permissions using chmod.

  • The history command and the up arrow on your keyboard can be used to repeat recently used commands.

Module 1 | Lesson 5 | Redirection
  • grep is a powerful search tool with many options for customization.

  • >, >>, and | are different ways of redirecting output.

  • command > file redirects a command’s output to a file.

  • command >> file redirects a command’s output to a file without overwriting the existing contents of the file.

  • command_1 | command_2 redirects the output of the first command as input to the second command.

  • for loops are used for iteration.

  • basename gets rid of repetitive parts of names.

Module 1 | Lesson 6 | Writing Scripts and Working with Data
  • Scripts are a collection of commands executed together.

  • Transferring information to and from virtual and local computers.

Module 1 | Lesson 7 | Project Organization
  • Spend the time to organize your file system when you start a new project. Your future self will thank you!

  • Always save a write-protected copy of your raw data.

Module 2 | Lesson 1 | Next-Generation Sequencing Methods
  • These lessons help you understand NGS sequencing methods and how .fastq files are generated

Module 2 | Lesson 2 | Background and Metadata
  • It’s important to record and understand your experiment’s metadata.

Module 2 | Lesson 3 | Assessing Read Quality
  • Quality encodings vary across sequencing platforms.

  • for loops let you perform the same set of operations on multiple files with a single command.

Module 2 | Lesson 4 | Trimming and Filtering
  • The options you set for the command-line tools you use are important!

  • Data cleaning is an essential step in a genomics workflow.

Module 2 | Lesson 5 | Variant Calling Workflow
  • Bioinformatic command line tools are collections of commands that can be used to carry out bioinformatic analyses.

  • To use most powerful bioinformatic tools, you’ll need to use the command line.

  • There are many different file formats for storing genomics data. It’s important to understand what type of information is contained in each file, and how it was derived.

Module 2 | Lesson 6 | Automating a Variant Calling Workflow
  • We can combine multiple commands into a shell script to automate a workflow.

  • Use echo statements within your scripts to get an automated progress update.

Module 3 | Lesson 1 | Fly-CURE - Project Overview
  • This is an authentic hypothesis-driven research project spanning multiple semesters and institutions.

Module 3 | Lesson 2 | Fly-CURE - Assessing Read Quality
  • for loops let you perform the same set of operations on multiple files with a single command.

  • FastQC enables us to validate the continued use of the sequencing data.

Module 3 | Lesson 3 | Fly-CURE - Trimming and Filtering
  • The options you set for the command-line tools you use are important!

  • Data cleaning is an essential step in a genomics workflow.

Module 3 | Lesson 4 | Fly-CURE - Alignment
  • Bioinformatic command line tools are collections of commands that can be used to carry out bioinformatic analyses.

  • To use most powerful bioinformatic tools, you’ll need to use the command line.

  • There are many different file formats for storing genomics data. It’s important to understand what type of information is contained in each file, and how it was derived.

Module 3 | Lesson 5 | Fly-CURE - Converting, Sorting, and Indexing bam files
  • Bioinformatic command line tools are collections of commands that can be used to carry out bioinformatic analyses.

  • To use most powerful bioinformatic tools, you’ll need to use the command line.

  • There are many different file formats for storing genomics data. It’s important to understand what type of information is contained in each file, and how it was derived.

Module 3 | Lesson 6 | Fly-CURE - bcftools
  • Bioinformatic command line tools are collections of commands that can be used to carry out bioinformatic analyses.

  • To use most powerful bioinformatic tools, you’ll need to use the command line.

  • There are many different file formats for storing genomics data. It’s important to understand what type of information is contained in each file, and how it was derived.

Module 3 | Lesson 7 | Fly-CURE - SnpEff and SnpSift
  • Bioinformatic command line tools are collections of commands that can be used to carry out bioinformatic analyses.

  • To use most powerful bioinformatic tools, you’ll need to use the command line.

Module 3 | Lesson 8 | Fly-CURE - Final Identification of SNPs
  • R script used to generate useable list of unique SNPs for each mutant.

  • End of the bioinformatics pipeline.

Glossary

FIXME