This course extends reproducible research craft into applied data science for infectious disease: ingesting messy surveillance data, wrangling and validating it, and communicating results with honest visualizations. It sequences the site’s Programming and Computing library around real infectious-disease datasets.
The course syllabus is shown below.
Draft syllabus. This is a scaffold for the concentration. Course number, credit hours, dates, and specific assignments are placeholders and will be finalized before the course is offered.
Course title and instructors
Title: Data Science for Infectious Disease
Course Number: BIO 3xx (proposed; confirm with the Department of Biology)
Semester: TBD
Credit Hours: 3
Meeting Time: TBD
Course Director: Michael E. DeWitt, MS
Email: medewitt@wakehealth.edu or dewime23@wfu.edu
Course description
Working with real infectious-disease data means dealing with files in awkward formats, records that are missing or duplicated, and results that must be communicated without overstating what the data support. This course teaches the applied data-handling side of the work. Students ingest and validate data from files and APIs, handle formats, missingness, and secrets safely, structure a reproducible project under version control with tested and documented code, build clear and honest visualizations that show uncertainty, and reason about performance and numerical stability as analyses scale. The material comes from the site’s Programming and Computing library, sequenced around real datasets.
This course overlaps with Research Tools and Methods, which covers reproducible research craft. The intended split is clean: Research Tools and Methods teaches the craft and reproducibility habits, while this course focuses on applied data handling and visualization with infectious-disease data. Where a course renumber or absorption is proposed, the overlap should be resolved so the two courses do not duplicate content.
Learning outcomes
Upon successful completion of this course, students will be able to:
- Ingest and validate infectious-disease data from files and APIs, handling formats, missingness, and secrets safely
- Structure a reproducible project with version control and tested, documented code
- Build clear, honest visualizations of epidemiologic data with uncertainty
- Reason about performance and numerical stability when analyses scale
- Move data between formats and reshape it into an analysis-ready structure
Textbook and other resources
There is no single required textbook. Recommended references include:
- Wickham H, Çetinkaya-Rundel M, Grolemund G. R for Data Science. O’Reilly.
- Wilke CO. Fundamentals of Data Visualization. O’Reilly.
- Selected primary literature and public surveillance datasets
Additional readings will be assigned throughout the course.
Site resources
This course draws on IDEEEP content pages as assigned readings and lab material:
- Programming and Computing
- Data representation and formats
- Data ingestion and APIs
- Project workflow
- Version control with Git
- Testing scientific code
- Debugging and troubleshooting
- Reproducibility
- Graphing data
- Manipulating data
- HPC clusters and Slurm
- Research Tools and Methods
New concept pages on data-visualization principles and on tidy and relational data are planned and will be linked here once published.
Course structure and schedule
This course meets over 15 weeks and combines lecture with computer labs on real infectious-disease datasets. The schedule below is a draft outline of topics.
| Week | Topic |
|---|---|
| 1 | Introduction: the infectious-disease data pipeline |
| 2 | Data representation and formats |
| 3 | Tidy and relational data |
| 4 | Data ingestion from files |
| 5 | Ingestion from APIs and handling secrets safely |
| 6 | Validation and missingness |
| 7 | Reshaping and manipulating data |
| 8 | Project workflow and structure |
| 9 | Version control with Git |
| 10 | Testing and debugging scientific code |
| 11 | Reproducibility |
| 12 | Principles of honest data visualization |
| 13 | Visualizing uncertainty |
| 14 | Performance, numerical stability, and scaling to HPC |
| 15 | Project presentations and wrap-up |
Note: Specific dates will be provided at the beginning of the semester. Topics may be adjusted based on class progress and student interests.
Grades and assignments
| Activity | Weight |
|---|---|
| Participation and lab discussion | 20% |
| Computer labs and assignments | 30% |
| Exam(s) | 20% |
| Final project | 30% |
Final project: Students will build a reproducible analysis of a real infectious-disease dataset from ingestion through validation to visualization, with tested code under version control and honest communication of uncertainty.
Course policies
Attendance: Regular attendance is expected, particularly for discussion sessions. Please alert the instructor if you are unable to attend for any reason.
Late/Makeup work: Assignments are due on the dates provided. We recognize that extenuating circumstances arise, and assignments may be submitted up to 2 days late without penalty. If you need an extension, contact the instructor as soon as possible and before the due date.
Artificial intelligence: Artificial intelligence tools and large language models such as ChatGPT, Claude, and Gemini are now part of the academic and professional landscape and we encourage you to find ways to use them to enhance your learning. However, if you use these tools, you must cite your sources and provide a detailed description of the tools you used to complete the assignment. In no way can these tools take the place of your own work and understanding of the material. They should be used to supplement your learning, not replace it. You are ultimately responsible for your work including content and the use of valid citations and references. Using these tools without proper attribution is plagiarism and will be treated as such.
Department/School/University policies
Academic Integrity: Wake Forest University is committed to a culture of academic integrity. As a part of this community, you share the responsibility for creating a place of honesty, intellectual curiosity, and individual accountability. As you committed to with your honor pledge signature, you agree “not to deceive any member of the community; not to steal, cheat, or plagiarize on academic work; and not to engage in any other form of academic misconduct.” If you have questions about documenting your work, working with external sources, or working with peers on assigned work, consult with me as soon as possible. Instances of academic dishonesty will be referred to the Honor and Ethics Council.
Accessibility: Wake Forest University provides reasonable accommodations to students with disabilities. If you are in need of an accommodation, please contact me privately as early in the term as possible. Retroactive accommodations will not be provided. Students requiring accommodations must also consult the Center for Learning, Access, and Student Success (118 Reynolda Hall, 336-758-5929, class.wfu.edu).
Accommodations for Religious or Spiritual Practices: Wake Forest University benefits from the multitude of faiths and spiritual identities held by members of our learning community. Should you need accommodations this semester, email me as soon as possible to ensure we have time to develop equitable alternatives.
Class recordings: In case any class recordings are provided, they are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form.
Syllabus change notice
This syllabus and the dates herein are subject to change.