Module 1 | Lesson 2 | Introducing the Shell
Overview
Teaching: 30 min
Exercises: 30 minQuestions
What is a command shell and why would I use one?
How can I move around on my computer?
How can I see what files and directories I have?
How can I specify the location of a file or directory on my computer?
Objectives
Describe key reasons for learning shell.
Navigate your file system using the command line.
Access and read help files for
bash
programs and use help files to identify useful command options.Demonstrate the use of tab completion, and explain its advantages.
Recorded Lesson:
Module 1 | Lesson 2 | Introducing the Shell
What is a shell and why should I care?
A shell is a computer program that presents a command line interface which allows you to control your computer using commands entered with a keyboard instead of controlling graphical user interfaces (GUIs) with a mouse/keyboard combination.
There are many reasons to learn about the shell:
- Many bioinformatics tools can only be used through a command line interface, or have extra capabilities in the command line version that are not available in the GUI. This is true, for example, of BLAST, which offers many advanced functions only accessible to users who know how to use a shell.
- The shell makes your work less boring. In bioinformatics you often need to do the same set of tasks with a large number of files. Learning the shell will allow you to automate those repetitive tasks and leave you free to do more exciting things.
- The shell makes your work less error-prone. When humans do the same thing a hundred different times (or even ten times), they’re likely to make a mistake. Your computer can do the same thing a thousand times with no mistakes.
- The shell makes your work more reproducible. When you carry out your work in the command-line (rather than a GUI), your computer keeps a record of every step that you’ve carried out, which you can use to re-do your work when you need to. It also gives you a way to communicate unambiguously what you’ve done, so that others can check your work or apply your process to new data.
- Many bioinformatic tasks require large amounts of computing power and can’t realistically be run on your own machine. These tasks are best performed using remote computers or cloud computing, which can only be accessed through a shell.
In this lesson you will learn how to use the command line interface to move around in your file system.
How to access the shell
To begin, we want to launch the FlyCURE app, but this time we will change the path where the data is located to your_username. This will be the same path we use the rest of the semester. To launch the app from your username you will follow the same directions as you did previously but change the path where the data set is located. If you have already launched the app once from your username, you may simply re-launch the app with fewer steps.
You will launch the app this way for the entire course.
You are now ready to start using the command shell! Once the app is launched, open the terminal and you are now ready for the lesson.
Navigating your file system
The part of the operating system responsible for managing files and directories is called the file system. It organizes our data into files, which hold information, and directories (also called “folders”), which hold files or other directories.
Several commands are frequently used to create, inspect, rename, and delete files and directories.
Preparation Magic
If you type the command:
PS1='$ '
into your shell, followed by pressing the Enter key, your window should look like our example in this lesson.
This isn’t necessary to follow along (in fact, your prompt may have other helpful information you want to know about). This is up to you! Close your terminal and re-open to restore the terminal back to the original settings.
$
The dollar sign is a prompt, which shows us that the shell is waiting for input; your shell may use a different character as a prompt and may add information before the prompt. When typing commands, either from these lessons or from other sources, do not type the prompt, only the commands that follow it.
Let’s find out where we are by running a command called pwd
(which stands for “print working directory”).
At any moment, our current working directory
is our current default directory,
i.e.,
the directory that the computer assumes we want to run commands in,
unless we explicitly specify something else.
Here,
the computer’s response is /home/gea_user
,
which is the top level directory within our cloud system:
$ pwd
/home/gea_user
Let’s look at how our file system is organized. We can see what files and subdirectories are in this directory by running ls
,
which stands for “listing” or “list stuff”:
$ ls
data home
ls
prints the names of the files and directories in the current directory in
alphabetical order,
arranged neatly into columns.
The command to change locations in our file system is cd
, followed by a
directory name to change our working directory.
cd
stands for “change directory”.
Let’s say we want to navigate to the home
directory where we will find our subdirectory’s. We previously moved our data
directory to home
. We may use ~/data
later in the semester, but for now all of our work will be done in ~/home
.
$ cd home
$ ls
kbieser shared
$ cd your_username (remember to replace `your_username` with your CyVerse username.)
$ ls
data
$ cd data
$ ls
shell_data
We’ll be working within the shell_data
subdirectory, and creating new subdirectories, throughout this class.
Anatomy of a Command
Figure 1: Anatomy of a command. Between the command and an argument we made add additional options depending on the command being used.
Navigating your file system continued
For now, let’s navigate to the shell_data
directory we saw above. We can
use the following command to get there:
$ cd shell_data
Let’s look at what is in this directory:
$ ls
sra_metadata untrimmed_fastq
We can make the ls
output more comprehensible by using the flag or option -F
,
which tells ls
to add a trailing /
to the names of directories:
$ ls -F
sra_metadata/ untrimmed_fastq/
Anything with a “/” after it is a directory. Things with a *
after them are files.
Consider how you would navigate to find an assignment you have saved on your computer (Figure 1). In this example from my computer, “Dropbox(NSC)” is the home directory with a subdirectory called “My_Repositories”. Within “My_Repositories” there is another subdirectory called “shell_genomics”. At this point, you can see there are more subdirectories, but I have also reached files which would be analogous to your assignment written in Word.
ls
has lots of other options. To find out what they are, we can type:
$ man ls
Some manual files are very long. You can scroll through the file using your keyboard’s down arrow or use the Space key to go forward one page and the b key to go backwards one page. When you are done reading, hit q to quit.
Exercise
Use the
-l
option for thels
command to display more information for each item in the directory. What is one piece of additional information this long format gives you that you don’t see with the barels
command?Solution
$ ls -l
total 0 drwx------ 1 gea_user gea_user 0 Aug 4 2021 sra_metadata drwx------ 1 gea_user gea_user 0 Aug 4 2021 untrimmed_fastq
The additional information given includes the name of the owner of the file, when the file was last modified, and whether the current user has permission to read and write to the file.
No one can possibly learn all of these arguments, that’s what the manual page is for. You can (and should) refer to the manual page or other help files as needed.
Let’s go into the untrimmed_fastq
directory and see what is in there.
$ cd untrimmed_fastq
$ ls -F
SRR097977.fastq* SRR098026.fastq*
This directory contains two files with .fastq
extensions. Notice there is *
which lets us know that these are files and not directories. FASTQ is a format for storing information about sequencing reads and their quality.
We will be learning more about FASTQ files in a later lesson.
Tip
If you like to use hot-key combinations you might be interested to know that clearing the console can be achieved by pressing
Ctrl+L
. Feel free to try it and see for yourself. Alternatively, you can type the commandclear
and it will also clear your console.
Shortcut: Tab Completion
Typing out file or directory names can waste a lot of time and it’s easy to make typing mistakes. Instead we can use tab complete as a shortcut. When you start typing out the name of a directory or file, then hit the Tab key, the shell will try to fill in the rest of the directory or file name. Practice using the Tab key anytime you are navigating in the command line.
Return to your home directory:
$ cd
then enter:
$ cd ho<tab>
The shell will fill in the rest of the directory name for
home
.
Let’s use tab again to get to our shell_data
directory. In order to get there, you will have to navigate into your_username
and data
directory. You can navigate to each directory, or string the directories together in the command.
$ cd your_username/da<tab>/she<tab>
Now change directories to untrimmed_fastq
in shell_data
$ cd untrimmed_fastq
Using tab complete can be very helpful. However, it will only autocomplete a file or directory name if you’ve typed enough characters to provide a unique identifier for the file or directory you are trying to access.
Tip
If you are hitting tab and the file or directory name is not auto-filling, you may be in the wrong directory. Tab completion will only work if you are in the correct location. Using the tab key is often useful in helping to troubleshoot and understand where you are located.
If we navigate back to our untrimmed_fastq
directory and try to access one
of our sample files:
$ cd
$ cd home
$ cd your_username
$ cd data
$ cd shell_data
$ cd untrimmed_fastq
$ ls SR<tab>
The shell auto-completes your command to SRR09
, because all file names in
the directory begin with this prefix. When you hit
Tab again, the shell will list the possible choices.
$ ls SRR09<tab><tab>
SRR097977.fastq SRR098026.fastq
Tab completion can also fill in the names of programs, which can be useful if you remember the beginning of a program name.
$ pw<tab><tab>
pwck pwconv pwd pwdx pwunconv
Displays the name of every program that starts with pw
.
Summary
We now know how to move around our file system using the command line. This gives us an advantage over interacting with the file system through a GUI as it allows us to work on a remote server, carry out the same set of operations on a large number of files quickly, and opens up many opportunities for using bioinformatic software that is only available in command line versions.
In the next few classes, we’ll be expanding on these skills and seeing how using the command line shell enables us to make our workflow more efficient and reproducible.
Key Points
The shell gives you the ability to work more efficiently by using keyboard commands rather than a GUI.
Useful commands for navigating your file system include:
ls
,pwd
, andcd
.Most commands take options (flags) which begin with a
-
.Tab completion can reduce errors from mistyping and make work more efficient in the shell.