Harvard School of Public Health
Two 2-hour sessions each week. Two 1-hour lab each week. Provides intensive instruction in the use of SAS to prepare data for statistical analysis. The focus is on database management and programming problems. Basic issues in each of these areas are discussed in the context of teaching the specific skills required to use SAS effectively.
Covers basic statistical techniques that are important for analyzing data arising from epidemiology, environmental health and biomedical and other public health-related research. Major topics include descriptive statistics, elements of probability, introduction to estimation and hypothesis testing, nonparametric methods, techniques for categorical data, regression analysis, analysis of variance, and elements of study design. Applications are stressed. Designed as an alternate to BIO200, for students desiring more emphasis on theoretical developments. Background in algebra and calculus strongly recommended.
Emphasizes concepts and methods for analysis of data which are categorical, rate-of-occurrence (e.g., incidence rate), and time-to-event (survival duration). Stresses applications in epidemiology, clinical trials, and other public health research. Topics include measures of association, 2x2 tables, stratification, matched pairs, logistic regression, model building, analysis of rates, and survival data analysis using proportional hazards models.
Covers research design, sample selection, questionnaire construction, interviewing techniques, the reduction and interpretation of data, and related facets of population survey investigations. Focuses primarily on the application of survey methods to problems of health program planning and evaluation. Treatment of methodology is sufficiently broad to be suitable for students who are concerned with epidemiological, nutritional, or other types of survey research. Formerly BIO212
This course will introduce students involved with clinical research to the practical application of multiple regression analysis. Linear regression, logistic regression and proportional hazards survival models will be covered, as well as general concepts in model selection, goodness-of-fit, and testing procedures. Each lecture will be accompanied by a data analysis using SAS and a classroom discussion of the results. The course will introduce, but will not attempt to develop the underlying likelihood theory. Background in SAS programming ability required.
Designed for individuals interested in the scientific, policy, and management aspects of clinical trials. Topics include types of clinical research, study design, treatment allocation, randomization and stratification, quality control, sample size requirements, patient consent, and interpretation of results. Students design a clinical investigation in their own field of interest, write a proposal for it, and critique recently published medical literature. Course Prerequisites: BIO201 or ID200 or ID201 or ID207 or BIO202&203 or BIO206&207 or BIO206&208 or BIO206&209Formerly BIO214
This course is intended for students who are already very comfortable with fundamental techniques in statistics. The course will cover methods for building and interpreting linear regression models, including statistical assumptions and diagnostics, estimation and testing, and model building techniques. These models will be extended to handle data arising from longitudinal studies employing repeated measurement of subjects over time. Summer/Residential Course Note (Section 1): Lectures will be accompanied by computing exercises using the SAS statistical package. Online Course Note (Section 2): Lectures will be accompanied by computing exercises using the Stata statistical package. Course Prerequisites: EPI522 or BST201 or ID200 or ID201 or ID207 or BST202&203 or BST206&207 or BST206&208Formerly BIO501
The goal of this course is to enable scientists and public health professionals who already have an introductory background in biostatistics and clinical trials to acquire the competencies in quantitative skills and systems thinking required to understand and participate in drug development and regulatory review processes. The course illustrates how statistical and quantitative methods are used to transform information into evidence demonstrating the safety, efficacy and effectiveness of drugs and devices over the course the product's life cycle from a regulatory perspective. Content is delivered using a blended-learning approach involving lectures, web-based media and selected case study examples derived from actual FDA decision-making and regulatory assessments to highlight and describe each phase of the regulatory drug approval process. Case studies will illustrate regulatory science in action and practice and will include content publically available from the FDA's website that can be used in conjunction with FDA science-based guidance and decision precedents. Course Prerequisites: ID538 or [(BIO200 or ID200 or BIO201 or BIO202&203 or BIO206&207/8/9) and (EPI200 or EPI201 or EPI208 or EPI505)]Formerly BIO523
This course will provide a basic, yet thorough introduction to the probability theory and mathematical statistics that underlie many of the commonly used techniques in public health research. Topics to be covered include probability distributions (normal, binomial, Poisson), means, variances and expected values, finite sampling distributions, parameter estimation (method of moments, maximum likelihood), confidence intervals, hypothesis testing (likelihood ratio, Wald and score tests). All theoretical material will be motivated with problems from epidemiology, biostatistics, environmental health and other public health areas. This course is aimed towards second year doctoral students in fields other than Biostatistics. Background in algebra and calculus required. Course Prerequisites: BST210 or BST213 Formerly BIO222
Topics will include types of censoring, hazard, survivor, and cumulative hazard functions, Kaplan-Meier and actuarial estimation of the survival distribution, comparison of survival using log rank and other tests, regression models including the Cox proportional hazards model and the accelerated failure time model, adjustment for time-varying covariates, and the use of parametric distributions (exponential, Weibull) in survival analysis. Methods for recurrent survival outcomes and competing risks will also be discussed, as well as design of studies with survival outcomes. Class material will include presentation of statistical methods for estimation and testing along with current software (SAS, Stata) for implementing analyses of survival data. Applications to real data will be emphasized. Course Prerequisite(s): BST210 or BST213 or BST 230, or permission of instructor required. BST 213 may be taken concurrently. Formerly BIO223
This course covers modern methods for the analysis of repeated measures, correlated outcomes and longitudinal data, including the unbalanced and incomplete data sets characteristic of biomedical research. Topics include an introduction to the analysis of correlated data, analysis of response profiles, fitting parametric curves, covariance pattern models, random effects and growth curve models, and generalized linear models for longitudinal data, including generalized estimating equations (GEE) and generalized linear mixed effects models (GLMMs).Course Activities: Homework assignments will focus on data analysis in SAS using PROC GLM, PROC MIXED, PROC GENMOD, and PROC GLIMMIX. Course Note: Lab or section times will be announced at first meeting. Course Prerequisite(s): BIO210 or BIO211 or BIO213 or BIO232Formerly BIO226
This course introduces students to the diverse statistical methods used throughout the process of statistical genetics, from familial aggregation and segregation studies to linkage scans and association studies. Topics covered include basic principles from population genetics, multipoint and model-free linkage analysis, family-based and population-based association testing, and Genome Wide Association analysis. Instructors use ongoing research into the genetics of respiratory disease, psychiatric disorders and cancer to illustrate basic principles. Weekly homework supplements reading, course lectures, discussion and section. Relevant concepts in genetics and molecular genetics will be reviewed in lectures and labs. The emphasis of the course is fundamental principles and concepts. Course Prerequisites: BST210 (concurrent enrollment allowed)Course Note: There will be a weekly lab section; the time will be scheduled at first meeting. Formerly BIO227
This course is a practical introduction to the Bayesian analysis of biomedical data. It is an intermediate Master's level course in the philosophy, analytic strategies, implementation, and interpretation of Bayesian data analysis. Specific topics that will be covered include: the Bayesian paradigm; Bayesian analysis of basic models; Bayesian computing: Markov Chain Monte Carlo; STAN R software package for Bayesian data analysis; linear regression; hierarchical regression models; generalized linear models; meta-analysis; models for missing data. Programming and case studies will be used throughout the course to provide hands-on training in these concepts. Prerequisites: BST210 and BST222, or permission of the instructor
Axiomatic foundations of probability, independence, conditional probability, joint distributions, transformations, moment generating functions, characteristic functions, moment inequalities, sampling distributions, modes of convergence and their interrelationships, laws of large numbers, central limit theorem, and stochastic processes.
A fundamental course in statistical inference. Discusses general principles of data reduction: exponential families, sufficiency, ancillarity and completeness. Describes general methods of point and interval parameter estimation and the small and large sample properties of estimators: method of moments, maximum likelihood, unbiased estimation, Rao-Blackwell and Lehmann-Scheffe theorems, information inequality, asymptotic relative efficiency of estimators. Describes general methods of hypothesis testing and optimality properties of tests: Neyman-Pearson theory, likelihood ratio tests, score and Wald tests, uniformly and locally most powerful tests, asymptotic relative efficiency of tests. Course Note: Lab or section time to be announced at first meeting; cross-listed: HSPH student must register for HSPH course. Course Prerequisite(s): BIO230 (concurrent enrollment allowed)Formerly BIO231
Introduction to the data structures and computer algorithms that are relevant to statistical computing. The implementation of data structures and algorithms for data management and numerical computations are discussed. Course Prerequisite(s): Instructor's Permission Formerly BIO514
An advanced course in linear models, including both classical theory and methods for high dimensional data. Topics include theory of estimation and hypothesis testing, multiple testing problems and false discovery rates, cross validation and model selection, regularization and the LASSO, principal components and dimensional reduction, and classification methods. Background in matrix algebra and linear regression required. Prerequisite: BST 231 and BST 233, or permission of instructor required. Formerly BIO235
A foundational course in measure theoretic probability. Topics include measure theory, Lebesgue integration, product measure and Fubini's Theorem, Radon-Nikodym derivatives, conditional probability, conditional expectation, limit theorems on sequences of random stochastic processes, and weak convergence. Course Prerequisites: BST231 or permission from the instructor required. Formerly BIO250
Sequel to BIO 231. Considers several advanced topics in statistical inference. Topics include limit theorems, multivariate delta method, properties of maximum likelihood estimators, saddle point approximations, asymptotic relative efficiency, robust and rank-based procedures, resampling methods, and nonparametric curve estimation. Course Note: Cross-listed, HSPH must register for HSPH course. Course Prerequisites: BIO231 and BIO250, or permission of instructor required. Formerly BIO251
Presents classical and modern approaches to the analysis of multivariate observations, repeated measures, and longitudinal data. Topics include the multivariate normal distribution, Hotelling's T2, MANOVA, the multivariate linear model, random effects and growth curve models, generalized estimating equations, statistical analysis of multivariate categorical outcomes, and estimation with missing data. Discusses computational issues for both traditional and new methodologies. Course Note: Cross-listed, HSPH student must register for HSPH course. Course Prerequisite: BIO231 and BIO235, or permission of the instructor are required. Formerly BIO245
BST247 is a seminar style course with readings selected from the literature in areas of expertise of the participating faculty. Content may vary from year to year. The specific objectives are (1) To train students to critically read foundational papers and current journal articles in Statistical Genetics, (2) To train students to present sophisticated ideas to an audience of peers, and (3) To prepare students to engage in doctoral level research in the area. After the course, students are expected to have an in-depth and broad understanding on important topics of statistical genetics research. Course Prerequisite(s): BIO227 and (BIO231 or EPI511). BIO231 may be taken concurrently. Formerly BIO257
General principles of the Bayesian approach, prior distributions, hierarchical models and modeling techniques, approximate inference, Markov chain Monte Carlo methods, model assessment and comparison. Bayesian approaches to GLMMs, multiple testing, nonparametrics, clinical trials, survival analysis.
This course is the second course in the foundational sequence of the School’s newly approved Master’s Degree in Health Data Science. The course will build upon our existing course, BST260 Introduction to Data Science, in presenting a set of tools for modeling and understanding complex datasets. Specifically, the course will provide practical regression and tree-based techniques for big data. Specific topics that will be covered include: linear model selection and regularization: LASSO and regularization; principal component regression and partial least squares; tree-based methods: decision trees; bagging, random forests, and boosting; unsupervised learning: principal components analysis, cluster analysis. Programming (Python and R) and case studies will be used throughout the course to provide hands-on training in these concepts. Prerequisites: BST260 or permission of instructor
Many systems of scientific and societal interest consist of a large number of interacting components. The structure of these systems can be represented as networks where network nodes represent the components and network edges the interactions between the components. Network analysis can be used to study how pathogens, behaviors and information spread in social networks, having important implications for our understanding of epidemics and the planning of effective interventions. In a biological context, at a molecular level, network analysis can be applied to gene regulation networks, signal transduction networks, protein interaction networks, and more. This introductory course covers some basic network measures, models, and processes that unfold on networks. The covered material applies to a wide range of networks, but we will focus on social and biological networks. To analyze and model networks, we will learn the basics of the Python programming language and its Network X module. The course contains a number of hands-on computer lab sessions. There are five homework assignments and four reading assignments that will be discussed in class. In addition, each student will complete a final project that applies network analysis techniques to study a public health problem. Course Prerequisites: BST201 or ID200 or ID201 or ID207 or BST202&203 or BST206&207 or BST206&208 Formerly BIO521
This course is an introduction to modern statistical computing techniques used to characterize and interpret cancer genome sequencing datasets. This Master's level course will begin with a basic introduction to DNA, genes, and genomes for students with no biology background. It will then introduce cancer as an evolutionary process and review landmarks in the history of cancer genetics, and discuss the basics of sequencing technology and modern Next Generation Sequencing. The course will cover the main steps involved in turning billions of short sequencing reads into a representation of the somatic genetic alterations characterizing an individual patient’s cancer, and will build on this foundation to study topics related to identifying mutations under positive selection from multiple tumors sampled in a population. By the end of the course, students will be able to apply state-of-the art analysis to cancer genome datasets and to critically evaluate papers employing cancer genome data.
Epigenetics is a fast growing field, with increasing applicability in environmental and epidemiology studies, focusing on the alterations in chromatin structure that can stably and heritably influence gene expression. Epigenetic changes can be as profound as those exerted by mutation, but, unlike mutations, are reversible and responsive to environmental influences. The course will focus on epigenetic mechanisms and laboratory methods for DNA methylamine, his tone modifications, small non-coding RNAs, and epigenomics. Ongoing experimental, and epidemiology studies (cohort, case-control, cross-sectional and repeated measurement studies) will be presented to introduce the students to the epigenetic effects in prenatal/early and adult life of environmental factors, including air pollution, metals, pesticides, benzene, PCBs, persistent organic pollutants, and diet. The course will enable them to understand and apply epigenetic methods in multiple areas, including cardiovascular and respiratory disease, aging, reproductive health, inflammation/immunity, and cancer.
EPI201 introduces the principles and methods used in epidemiologic research. The course discusses the conceptual and practical issues encountered in the design and analysis of epidemiologic studies for description and causal inference. EPI201 is the first course in the series of methods courses designed for students majoring in Epidemiology, Biostatistics and related fields, and those interested in a detailed introduction to the design and conduct of epidemiologic studies. Students who take EPI201 are expected to take EPI202 (Methods II). Course Note: Thursday or Friday lab required.
This course will present an introduction to the methods of data mining and predictive modeling, with applications to both genetic and clinical data. Basic concepts and philosophy of supervised and unsupervised data mining as well as appropriate applications will be discussed. Topics covered will include multiple comparisons adjustment, cluster analysis, principal component analysis, and predictive model building through logistic regression, classification and regression trees (CART), multivariate adaptive splines (MARS), neural networks, random forests, and bagging and boosting. Course Activities: Computer labs. Course Note: Students should be familiar with logistic regression.
This is an introductory level class on the analysis of mortality, fertility and population change. It is required for all masters' and doctoral students in the department of Global Health and Population. Students are introduced to the core literature in this field through lectures, and assigned readings selected from peer-reviewed journals and textbooks. Together, these provide a graduate-level introduction to the principle sources and characteristics of population data and to the essential methods used for the analysis of population problems. The emphasis throughout is on understanding the key processes, models and assumptions used primarily for the analysis of demographic components. Practical training will be given through a required weekly laboratory session, assignments, and a final examination. Examples presented in class and used in assignments are drawn from several countries, combining both developed and developing in assignments are drawn from several countries, combining both developed and developing world realities.
Designed to bring students to an intermediate-level understanding of microeconomic theory. Emphasizes the uses and limitations of the economic approach, with applications to public health.
This course is designed to introduce the student to the methods and growing range of applications of decision analysis and cost-effectiveness analysis in health technology assessment, medical and public health decision making, and health resource allocation. The objectives of the course are: (1) to provide a basic technical understanding of the methods used, (2) to give the student an appreciation of the practical problems in applying these methods to the evaluation of clinical interventions and public health policies, and (3) to give the student an appreciation of the uses and limitations of these methods in decision making at the individual, organizational, and policy level both in developed and developing countries.
Harvard Business School
This course is intended for students who have a career interest in leading or investing in companies in the health care sector, and are interested in how information technology can improve quality, efficiency, and access to healthcare. Students will be exposed to pure healthcare IT (HCIT)businesses, healthcare service businesses that rely heavily on IT to achieve their goals, and pure healthcare service businesses. Though the vast majority of these enterprises will be for-profit, the course will also examine the innovative role social ventures play in healthcare.
Students are required to prepare a business plan, which employs the framework of this course, to explore an entrepreneurial opportunity in health care, and to evaluate their classmates' plans.
Moving from simple (two-party, one-shot, price deals) to complex (multiple parties and issues, internal divisions, long time-frames, cross-border deals), the course integrates three complementary perspectives: analytic, behavioral, and contextual. While we will analyze a number of traditional case studies, the heart of the course is a series of interactive negotiation exercises. These exercises will give you hands-on negotiating experience. You will learn first by actually negotiating, and then by stepping back to compare your approach and results with others. You will be able to test your analytic ability and tactical skill, and to experiment with new approaches.
The course is a laboratory in which you will be both experimenter and subject. Sometimes the most important learning comes from apparent "failure"-and so the course is designed to let you fail in the safe setting of a classroom, and thus help you avoid costly real mistakes.
Harvard Medical School
This course will provide a firm foundation for understanding the relationship between molecular biology, developmental biology, genetics, genomics, bioinformatics, and medicine. The goal is to develop explicit connections between basic research, medical understanding, and the perspective of patients. During the course the principles of human genetics will be reviewed. Students will become familiar with the translation of clinical understanding into analysis at the level of the gene, chromosome and molecule, the concepts and techniques of molecular biology and genomics, and the strategies and methods of genetic analysis, including an introduction to bioinformatics. The course will extend beyond basic principles to current research activity in human genetics.
*Must be taken with HT923, the lab component.
This course teaches the student how information technologies shape and redefine the health care marketplace. Students learn how information technology enhances medical care through 1) improved economies of scale; 2) greater technical efficiencies in the delivery of care to patients; 3) advanced tools for patient education and self-care; 4) network-integrated decision support tools for clinicians; and 5) e-health applications and commerce. Students ordinarily take this course in conjunction with HST 923, the tutorial and practicum portion of the course, to work in interdisciplinary teams to design an innovative solution to a current or future health care problem. Students taking this course alone will fulfill course requirements by doing a 20-page term paper on the above topic, by prior arrangement with the course director.
*Must be taken with HT921, the classroom component.
This course teaches the student how information technologies shape and redefine the health care marketplace. Students learn how information technology enhances medical care through: 1) improved economies of scale; 2) greater technical efficiencies in the delivery of care to patients; 3) advanced tools for patient education and self-care; 4) network integrated decision support tools for clinicians; and 5) e-health applications and commerce. Students ordinarily take this course in conjunction with HST 921, the lecture portion of the course, to work in interdisciplinary teams to design an innovative solution to a current or future health care problem. Students who wish to take this course alone must have permission of the course director.
Harvard Faculty of Arts and Sciences
In-depth study of genomics: models of evolution and population genetics; comparative genomics: analysis and comparison; structural genomics: protein structure, evolution and interactions; functional genomics, gene expression, structure and dynamics of regulatory networks.
Usability and design as keys to successful technology. Covers user observation techniques, needs assessment, low and high fidelity prototyping, usability testing methods, as well as theory of human perception and performance, and design best practices. Focuses on understanding and applying the lessons of human interaction to the design of usable systems; will also look at lessons to be learned from less usable systems. The course includes several small and one large project.
Data Science 1 is the first half of a one-year introduction to data science. The course will focus on the analysis of messy, real life data to perform predictions using statistical and machine learning methods. Material covered will integrate the five key facets of an investigation using data: (1) data collection - data wrangling, cleaning, and sampling to get a suitable data set; (2) data management - accessing data quickly and reliably; (3) exploratory data analysis, generating hypotheses and building intuition; (4) prediction or statistical learning; and (5) communication , summarizing results through visualization, stories, and interpretable summaries. Recommended: Programming knowledge at the level of CS 50 or above, and statistics knowledge at the level of Stat 100 or above (Stat 110 recommended).
Data Science 2 is the second half of a one-year introduction to data science. Building upon the material in Data Science 1, the course introduces advanced methods for data wrangling, data visualization, and statistical modeling and prediction. Topics include big data and database management, interactive visualizations, nonlinear statistical models, and deep learning.
*40 week course spanning Fall to Spring
The Harvard Catalyst Postgraduate Education Program in Clinical & Translational Science provides training to clinical investigators through a range of educational offerings. This course is part of the advanced curriculum and is designed for independent researchers.
This course offers a comprehensive introduction to biostatistics in medical research. The course includes a review of the most common techniques in the field, as well as the manner in which these techniques are applied in standard statistical software. At the conclusion of the course, participants will be able to choose an appropriate study design, calculate the sample size needed to complete a study, analyze the collected data, and communicate the results from their experiment.