Computer Basics for Scientists

Before you can write reproducible analysis code, you need a clear mental model of how a computer stores and finds your files. A little fluency with directories and the command line will save you hours of frustration and make your work far easier to reproduce.

How Files Are Stored

Think of your computer’s storage as a tree of nested folders (also called directories). Each folder can contain files and more folders. A file is just a named container of bytes living somewhere in that tree.

Every file has a path describing where it lives.

Absolute vs. Relative Paths

# Absolute path (Mac/Linux): unambiguous, works from anywhere
/home/alice/projects/flu-study/data/cases.csv

# Relative path: depends on your current working directory
data/cases.csv

# Special shortcuts
.        # the current directory
..       # the parent directory (one level up)
~        # your home directory

Do use relative paths inside a project (data/cases.csv) so the project still works when moved or shared. Don’t hard-code absolute paths like /home/alice/... into scripts you plan to share, they will break on anyone else’s machine.

The Working Directory

The working directory is the folder a program treats as “here.” When a script says read.csv("data/cases.csv"), it looks relative to the working directory. Always know where you are before running code.

pwd          # print working directory: shows where you currently are

Common File Extensions

The extension is a hint about a file’s format. It does not change the contents by itself.

ExtensionContents
.csvcomma-separated values, tabular data (plain text)
.jsonstructured key/value data (plain text)
.txtunformatted plain text
.RR script
.pyPython script

Plain Text vs. Binary

Do prefer plain-text formats for data and code, they are transparent, durable, and version-control-friendly. Don’t store your primary dataset only inside a proprietary binary format.

Software vs. Hardware

When a program is “slow,” it may be waiting on the CPU (computation), running out of RAM (memory), or reading a large file from disk. Knowing which helps you fix it.

GUI vs. Command Line

A GUI (graphical user interface) is point-and-click: menus, buttons, windows. The command line (a shell or terminal) is where you type text commands.

Why Scripts Beat Point-and-Click for Reproducibility

Clicking through menus leaves no record of what you did. A script is an exact, re-runnable recipe:

Do write your analysis as scripts. Don’t rely on a sequence of manual clicks you will not remember in six months.

A Starter Set of Shell Commands

These run in a Unix-style shell (Mac Terminal, Linux, or Git Bash / WSL on Windows).

pwd                      # print the current working directory
ls                       # list files in the current directory
ls -la                   # list all files (incl. hidden) with details
cd projects/flu-study    # change directory into a folder
cd ..                    # move up one directory
mkdir data               # make a new directory called "data"
mv old.csv data/         # move (or rename) a file
cp cases.csv backup.csv  # copy a file
rm scratch.txt           # remove (delete) a file -- no undo, be careful
cat notes.txt            # print a file's contents to the screen
less bigfile.log         # scroll through a large file (press q to quit)
chmod +x run.sh          # make a script executable
man ls                   # show the manual page for a command
grep "error" log.txt     # search for lines containing "error"
find . -name "*.csv"     # find all .csv files under the current directory

Do use man <command> (or <command> --help) whenever you forget how something works. Don’t run rm on paths you are unsure about, deletion is usually permanent from the shell.

Operating System Differences