Programming & Computing
There are some great general resources for learning how to use plain text environments for scientific computing more generally.
Topics
Bite-sized guides to good programming practice and the everyday tools of reproducible scientific computing, with examples in R, Python, and Julia.
- Good Programming Practices — naming, small functions, and clean code
- Project Workflow — organizing an analytic project
- Software Design & Packaging — modularity, interfaces, and turning analysis into a package
- Reproducibility — seeds, environments, and literate code
- Debugging and Troubleshooting — a calm process and the reprex
- A Simulation Toolkit — building fake data and simulation studies
- Randomness & Random Number Generation — seeds, sampling, and the parallel-RNG trap
- Big-O Notation & Computational Complexity — how work grows with data, and the
O(n²)trap - Data Structures & Choosing the Right Container — arrays, hash maps, and sets, and the list→set fix
- Data Representation & File Formats — encodings, CSV pitfalls, tidy/relational data, SQL, and FASTA/VCF
- Regular Expressions & Finite-State Machines — parsing sequences, logs, and messy field data
- Data Ingestion & APIs — pulling from GenBank, GBIF, and GISAID programmatically
- Recursion & Dynamic Programming — memoization, sequence alignment, HMMs, and tree likelihoods
- Graph & Network Algorithms — BFS/DFS, shortest paths, and connected components on biological networks
- Floating-Point Arithmetic & Numerical Stability — log space, the log-sum-exp trick, and why likelihoods hit zero
- Numerical Methods for Dynamical Systems — integrating ODEs, Euler vs RK4, stiffness, and solvers
- Testing & Verification for Scientific Code — unit tests, invariants, and testing stochastic code
- Vectorization, Memory & Profiling — constant-factor speed, the memory hierarchy, and profiling
- Parallelism & Concurrency — cores vs threads, race conditions, and thread oversubscription
- Manipulating Data Frames — dplyr, data.table, pandas, Polars, DataFrames.jl
- Computer Basics for Scientists — files, paths, and the command line
- Running Jobs on an HPC Cluster (SLURM) — the DEAC & DEMON clusters, modules, SSH, and job submission
- Handling Secrets and API Keys — keys, environment variables, and
.gitignore - Version Control with Git & GitHub
- Building a Personal Website
- LaTeX and Technical Documents
- Note-Taking with Org Mode
Resources
Integrated Development Environments
These are programs that help you write plain text documents, code, generate things.
- VSCode
- RStudio/ Posit
- NeoVim
- Emacs
Computing Environments
- R
- Julia
- Python
- Matlab ($) $