STAT 29000: Project 3 — Fall 2020
Motivation: The ability to navigate a shell, like bash, and use some of its powerful tools, is very useful. The number of disciplines utilizing data in new ways is ever-growing, and as such, it is very likely that many of you will eventually encounter a scenario where knowing your way around a terminal will be useful. We want to expose you to some of the most useful bash tools, help you navigate a filesystem, and even run bash tools from within an RMarkdown file in RStudio.
Context: At this point in time, you will each have varying levels of familiarity with Scholar. In this project we will learn how to use the terminal to navigate a UNIX-like system, experiment with various useful commands, and learn how to execute bash commands from within RStudio in an RMarkdown file.
Scope: bash, RStudio
There are a variety of ways to connect to Scholar. In this class, we will primarily connect to RStudio Server by opening a browser and navigating to rstudio.scholar.rcac.purdue.edu/, entering credentials, and using the excellent RStudio interface.
Here is a video to remind you about some of the basic tools you can use in UNIX/Linux:
This is the easiest book for learning this stuff; it is short and gets right to the point:
You just log in and you can see it all; we suggest Chapters 1, 3, 4, 5, 7 (you can basically skip chapters 2 and 6 the first time through).
It is a very short read (maybe, say, 2 or 3 hours altogether?), just a thin book that gets right to the details.
Questions
Question 1
Navigate to rstudio.scholar.rcac.purdue.edu/ and login. Take some time to click around and explore this tool. We will be writing and running Python, R, SQL, and bash all from within this interface. Navigate to Tools > Global Options …. Explore this interface and make at least 2 modifications. List what you changed.
Here are some changes Kevin likes:
-
Uncheck "Restore .Rdata into workspace at startup".
-
Change tab width 4.
-
Check "Soft-wrap R source files".
-
Check "Highlight selected line".
-
Check "Strip trailing horizontal whitespace when saving".
-
Uncheck "Show margin".
(Dr Ward does not like to customize his own environment, but he does use the emacs key bindings: Tools > Global Options > Code > Keybindings, but this is only recommended if you already know emacs.)
-
List of modifications you made to your Global Options.
Question 2
There are four primary panes, each with various tabs. In one of the panes there will be a tab labeled "Terminal". Click on that tab. This terminal by default will run a bash shell right within Scholar, the same as if you connected to Scholar using ThinLinc, and opened a terminal. Very convenient!
What is the default directory of your bash shell?
|
Start by reading the section on |
# read the manual for the `man` command
# use "k" or the up arrow to scroll up, "j" or the down arrow to scroll down
man man
-
The full filepath of default directory (home directory). Ex: Kevin’s is:
/home/kamstut -
The
bashcode used to show your home directory or current directory (also known as the working directory) when thebashshell is first launched.
Question 3
Learning to navigate away from our home directory to other folders, and back again, is vital. Perform the following actions, in order:
-
Write a single command to navigate to the folder containing our full datasets:
/class/datamine/data. -
Write a command to confirm you are in the correct folder.
-
Write a command to list the files and directories within the data directory. (You do not need to recursively list subdirectories and files contained therein.) What are the names of the files and directories?
-
Write another command to return back to your home directory.
-
Write a command to confirm you are in the correct folder.
Note: / is commonly referred to as the root directory in a linux/unix filesystem. Think of it as a folder that contains every other folder in the computer. /home is a folder within the root directory. /home/kamstut is the full filepath of Kevin’s home directory. There is a folder home inside the root directory. Inside home is another folder named kamstut which is Kevin’s home directory.
-
Command used to navigate to the data directory.
-
Command used to confirm you are in the data directory.
-
Command used to list files and folders.
-
List of files and folders in the data directory.
-
Command used to navigate back to the home directory.
-
Command used to confirm you are in the home directory.
Question 4
Let’s learn about two more important concepts. . refers to the current working directory, or the directory displayed when you run pwd. Unlike pwd you can use this when navigating the filesystem! So, for example, if you wanted to see the contents of a file called my_file.txt that lives in /home/kamstut (so, a full path of /home/kamstut/my_file.txt), and you are currently in /home/kamstut, you could run: cat ./my_file.txt.
.. represents the parent folder or the folder in which your current folder is contained. So let’s say I was in /home/kamstut/projects/ and I wanted to get the contents of the file /home/kamstut/my_file.txt. You could do: cat ../my_file.txt.
When you navigate a directory tree using . and .. you create paths that are called relative paths because they are relative to your current directory. Alternatively, a full path or (absolute path) is the path starting from the root directory. So /home/kamstut/my_file.txt is the absolute path for my_file.txt and ../my_file.txt is a relative path. Perform the following actions, in order:
-
Write a single command to navigate to the data directory.
-
Write a single command to navigate back to your home directory using a relative path. Do not use
~or thecdcommand without a path argument.
-
Command used to navigate to the data directory.
-
Command used to navigate back to your home directory that uses a relative path.
Question 5
In Scholar, when you want to deal with really large amounts of data, you want to access scratch (you can read more here). Your scratch directory on Scholar is located here: /scratch/scholar/$USER. $USER is an environment variable containing your username. Test it out: echo /scratch/scholar/$USER. Perform the following actions:
-
Navigate to your scratch directory.
-
Confirm you are in the correct location.
-
Execute
myquota. -
Find the location of the
myquotabash script. -
Output the first 5 and last 5 lines of the bash script.
-
Count the number of lines in the bash script.
-
How many kilobytes is the script?
|
You could use each of the commands in the relevant topics once. |
|
When you type |
|
Commands often have options. Options are features of the program that you can trigger specifically. You can see the options of a command in the |
# using the default wc command. "/class/datamine/data/flights/1987.csv" is the first "argument" given to the command.
wc /class/datamine/data/flights/1987.csv
# to count the lines, use the -l option
wc -l /class/datamine/data/flights/1987.csv
# to count the words, use the -w option
wc -w /class/datamine/data/flights/1987.csv
# you can combine options as well
wc -w -l /class/datamine/data/flights/1987.csv
# some people like to use a single tack `-`
wc -wl /class/datamine/data/flights/1987.csv
# order doesn't matter
wc -lw /class/datamine/data/flights/1987.csv
|
The |
-
Command used to navigate to your scratch directory.
-
Command used to confirm your location.
-
Output of
myquota. -
Command used to find the location of the
myquotascript. -
Absolute path of the
myquotascript. -
Command used to output the first 5 lines of the
myquotascript. -
Command used to output the last 5 lines of the
myquotascript. -
Command used to find the number of lines in the
myquotascript. -
Number of lines in the script.
-
Command used to find out how many kilobytes the script is.
-
Number of kilobytes that the script takes up.
Question 6
Perform the following operations:
-
Navigate to your scratch directory.
-
Copy and paste the file:
/class/datamine/data/flights/1987.csvto your current directory (scratch). -
Create a new directory called
my_test_dirin your scratch folder. -
Move the file you copied to your scratch directory, into your new folder.
-
Use
touchto create an empty file namedim_empty.txtin your scratch folder. -
Remove the directory
my_test_dirand the contents of the directory. -
Remove the
im_empty.txtfile.
|
|
-
Command used to navigate to your scratch directory.
-
Command used to copy the file,
/class/datamine/data/flights/1987.csvto your current directory (scratch). -
Command used to create a new directory called
my_test_dirin your scratch folder. -
Command used to move the file you copied earlier
1987.csvinto your newmy_test_dirfolder. -
Command used to create an empty file named
im_empty.txtin your scratch folder. -
Command used to remove the directory and the contents of the directory
my_test_dir. -
Command used to remove the
im_empty.txtfile.
Question 7
Please include a statement in Project 3 that says, "I acknowledge that the STAT 19000/29000/39000 1-credit Data Mine seminar will be recorded and posted on Piazza, for participants in this course." or if you disagree with this statement, please consult with us at datamine@purdue.edu for an alternative plan.