Module 1 | Lesson 1 | Getting Started
Overview
Teaching: 13 min
Exercises: 13 minQuestions
What is a command shell and why would I use one?
What programs will I be using in class?
How can I find the terminal and what is it?
Objectives
Access the CyVerse Discovery Environment, JupyterLab, and the terminal.
Recorded Lesson:
Module 1 | Lesson 1 | Getting Started
What is a shell and why should I care?
A shell is a computer program that presents a command line interface which allows you to control your computer using commands entered with a keyboard instead of controlling graphical user interfaces (GUIs) with a mouse/keyboard combination.
There are many reasons to learn about the shell:
- Many bioinformatics tools can only be used through a command line interface, or have extra capabilities in the command line version that are not available in the GUI. This is true, for example, of BLAST, which offers many advanced functions only accessible to users who know how to use a shell.
- The shell makes your work less boring. In bioinformatics you often need to do the same set of tasks with a large number of files. Learning the shell will allow you to automate those repetitive tasks and leave you free to do more exciting things.
- The shell makes your work less error-prone. When humans do the same thing a hundred different times (or even ten times), they’re likely to make a mistake. Your computer can do the same thing a thousand times with no mistakes.
- The shell makes your work more reproducible. When you carry out your work in the command-line (rather than a GUI), your computer keeps a record of every step that you’ve carried out, which you can use to re-do your work when you need to. It also gives you a way to communicate unambiguously what you’ve done, so that others can check your work or apply your process to new data.
- Many bioinformatic tasks require large amounts of computing power and can’t realistically be run on your own machine. These tasks are best performed using remote computers or cloud computing, which can only be accessed through a shell.
In this lesson you will learn how to gain access to the shell and tools we will be utilizing this semester.
How to access the shell
On a Mac or Linux machine, you can access a shell through a program called Terminal, which is already available on your computer. If you’re using Windows, you’ll need to download a separate program to access the shell or use Windows PowerShell. The problem with these methods are that they require you to access a remote server which often costs money paid by you, the institution, or a granting agency. Instead, this course has been developed using the CyVerse Discovery Environment which will allow us free access to a terminal and server. Functionally, each method works the same.
We will spend most of our time learning about the basics of the shell by manipulating some experimental data. Some of the data we’re going to be working with is quite large, and we’re also going to be using several bioinformatic packages in later lessons to work with this data. To avoid having to spend time downloading the data and downloading and installing all of the software, we’re going to be working with data on a remote server already stored for you in our Discovery Environment app.
First, you will need to create an account on CyVerse by clicking this link.
Please create your account using an official institutional email address (i.e. first.last@students.nevadastate.edu). After creating your account and logging in, you will see the CyVerse User Portal screen where you will want to select Workshops. Enroll in the course provided to you by your instructor using the username you just created.
Once you are enrolled in the workshop, under Services, you should see a link for the Discovery Environment. This is where you will launch our app from for every lesson. Click the link, and click login. This will log you into the Discovery Environment (you may have to re-enter the same login credentials you created for your CyVerse account).
Once logged in, type the name of our app FlyCURE into the search bar. Select Apps, and click on the Name FlyCURE. This will begin the process of launching the app where we will access the terminal and server.
The next screen will walk you through 4 steps to launch the app. We will leave the default settings for everything the very first time you launch the app. Every subsequent time you will change the Parameters Data set as shown in the following screenshot step #2. Follow the series of images to launch the app. (You may return to these directions any time during the semester if you forget.)
This next step will load a new tab called “JupyterLab” which is the app and data we will utilize. Be patient as it loads. When the app has loaded you should see this page which lets you know you have successfully launched the app.
How to transfer data to your username
The data is currently stored in kbieser, but we want to transfer the data to your username for all analyses. To do this, open the terminal.
Copy and paste the following command into the terminal. Hit enter.
$ cd data/input/
Copy and paste the following command into the terminal. Hit enter. Anytime you see “your_username”, you must replace this with your CyVerse account username.
$ cp -r data/ ~/home/your_username/
To verify the data transferred copy and paste the following commands. Hit enter after each command.
$ cd ~/home/your_username
$ ls
If the data transferred, you should now see a folder named “data”.
Tip
If you are ever uncertain if your data is saving, return to the Discovery Environment and check your data folder (The cloud icon on the left panel). Anything new that you did that day and saved should appear there. There won’t be anything we need to save until later in our lessons, but this is a good habit to get into to right away.
Best Practices: There are two ways to reduce the number of CPU hours that you consume. 1) Be sure to terminate an analysis as soon as it’s complete. Leaving an analysis running while it’s not doing anything is one of the quickest ways to use up your CPU quota. 2) Request fewer CPUs when submitting the analysis. Selecting 0 will automatically select the default (which is currently four). Selecting 1 will only request 1 CPU. For multi-threaded apps, it’s a good idea to select more than one CPU, but the specific number will depend on the app itself. You can see the status of both your storage and CPU quotas in the DE dashboard when you first log in. Exceeding your storage quota will mean you cannot upload any data, including analysis results/outputs. Exceeding your CPU quota will mean you cannot launch a new analysis. To help avoid these scenarios, please monitor your dashboard quotas.
Key Points
The shell gives you the ability to work more efficiently by using keyboard commands rather than a GUI.
Everything you need to utilize in class is housed in our JupyterLab app.
Login in to CyVerse and launch our class app everyday before class.