Graduate Programs
Master of Science in Data Science (MSDS)
The Master of Science in Data Science is an interdisciplinary program administered by the Columbia University Data Science Institute (DSI) and jointly offered in collaboration with Columbia Engineering’s Departments of Computer Science and Industrial Engineering and Operations Research and the Graduate School of Arts and Sciences Department of Statistics. Graduates earn a Master of Science degree from Columbia Engineering.
The program prepares students to design, analyze, and deploy data-driven systems across scientific, engineering, and societal domains. Combining rigorous theoretical foundations with applied training, the MSDS curriculum reflects current and emerging advances in machine learning, artificial intelligence, statistical modeling, scalable systems, optimization, and computational methods.
Instruction is provided by world-class Columbia faculty and industry practitioners, and students engage in coursework and applied experiences that emphasize both methodological depth and real-world implementation.
Garud Iyengar
Avanessians Director of the Data Science Institute
Nakul Verman
MS Data Science Program Director
Eleni Drinea
Senior Lecturer in the Discipline of Computer Science
Daniel Hsu
Associate Professor of Computer Science
Yining Liu
Lecturer in the Discipline of Data Science
Dobrin Marchev
Lecturer in the Discipline of Statistics
Joyce Robbins
Lecturer in the Discipline of Statistics
Daniel Bauer
Senior Lecturer in the Computer Science
Hardeep Johar
Teaching Professor of Industrial Engineering and Operations Research
Kaizhang Wang
Assistant Professor of Industrial Engineering and Operations Research
Curriculum Requirements
The Master of Science in Data Science requires the successful completion of 30 points of graduate coursework, including core courses, electives, and an integrated capstone project, as specified below.
Students enter the program with diverse academic backgrounds. Prior coursework in areas such as statistics, computer science, or mathematics may allow students to waive or test out of certain core requirements; waivers are evaluated on a case-by-case basis and must be reviewed and approved by the program faculty. Students who are approved for a waiver should work with the MSDS academic advising team to identify an appropriate replacement course.
M.S. students must also complete the professional development and leadership course, ENGI E4000 PROF DEVELOPMENT&LEADERSHIP, as a graduation requirement.
Core Curriculum
The core curriculum (21 credits) provides a comprehensive foundation in the mathematical, statistical, computational, and systems components of modern data science. Required courses include:
STAT GR5701 PROBABILITY & STAT FOR DATA SC. 3.00 points.
This course covers the following topics: Fundamentals of probability theory and statistical inference used in data science; Probabilistic models, random variables, useful distributions, expectations, law of large numbers, central limit theorem; Statistical inference; point and confidence interval estimation, hypothesis tests, linear regression
|
Fall 2026: STAT GR5701
|
|||||
| Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
|---|---|---|---|---|---|
| STAT 5701 | 001/14642 | M W 6:10pm - 7:25pm Room TBA |
Dobrin Marchev | 3.00 | 0/125 |
| STAT 5701 | 002/14643 | T Th 4:10pm - 5:25pm Room TBA |
Dobrin Marchev | 3.00 | 2/125 |
STAT GR5702 EXPLORATORY DATA ANALYSIS/VISUAL. 3.00 points.
This course is covers the following topics: fundamentals of data visualization, layered grammer of graphics, perception of discrete and continuous variables, intreoduction to Mondran, mosaic pots, parallel coordinate plots, introduction to ggobi, linked pots, brushing, dynamic graphics, model visualization, clustering and classification
|
Fall 2026: STAT GR5702
|
|||||
| Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
|---|---|---|---|---|---|
| STAT 5702 | 001/14644 | M W 4:10pm - 5:25pm Room TBA |
Joyce Robbins | 3.00 | 0/115 |
| STAT 5702 | 002/14645 | T Th 4:10pm - 5:25pm Room TBA |
Joyce Robbins | 3.00 | 0/115 |
STAT GR5703 STAT INFERENCE & MODELING. 3.00 points.
Prerequisites: (STAT GR5701) working knowledge of calculus and linear algebra (vectors and matrices), and STAT GR5203 or the equivalent.
Prerequisites: (STAT GR5701) working knowledge of calculus and linear algebra (vectors and matrices), STAT GR5701 or equivalent, and familiarity with a programming language (e.g. R, Python) for statistical data analysis. In this course, we will systematically cover fundamentals of statistical inference and modeling, with special attention to models and methods that address practical data issues. The course will be focused on inference and modeling approaches such as the EM algorithm, MCMC methods and Bayesian modeling, linear regression models, generalized linear regression models, nonparametric regressions, and statistical computing. In addition, the course will provide introduction to statistical methods and modeling that addresses various practical issues such as design of experiments, analysis of time-dependent data, missing values, etc. Throughpout the course, real-data examples will be used in lecture discussion and homework problems. This course lays the statistical foundation for inference and modeling using data, preparing the MS in Data Science students, for other courses in machine learning, data mining and visualization
|
Spring 2026: STAT GR5703
|
|||||
| Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
|---|---|---|---|---|---|
| STAT 5703 | 001/14271 | T Th 5:40pm - 6:55pm 301 Uris Hall |
Dobrin Marchev | 3.00 | 216/250 |
|
Fall 2026: STAT GR5703
|
|||||
| Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
| STAT 5703 | 001/14646 | T Th 6:10pm - 7:25pm Room TBA |
Dobrin Marchev | 3.00 | 11/50 |
CSOR W4246 ALGORITHMS FOR DATA SCIENCE. 3.00 points.
Prerequisites: COMS W1007 Basic knowledge in programming (e.g. at the level of COMS W1007), a basic grounding in calculus and linear algebra.
Corequisites: COMS W4121
Methods for organizing data, e.g. hashing, trees, queues, lists,priority queues. Streaming algorithms for computing statistics on the data. Sorting and searching. Basic graph models and algorithms for searching, shortest paths, and matching. Dynamic programming. Linear and convex programming. Floating point arithmetic, stability of numerical algorithms, Eigenvalues, singular values, PCA, gradient descent, stochastic gradient descent, and block coordinate descent. Conjugate gradient, Newton and quasi-Newton methods. Large scale applications from signal processing, collaborative filtering, recommendations systems, etc.
|
Fall 2026: CSOR W4246
|
|||||
| Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
|---|---|---|---|---|---|
| CSOR 4246 | 001/12497 | T Th 11:40am - 12:55pm Room TBA |
Eleni Drinea | 3.00 | 0/120 |
| CSOR 4246 | 002/12498 | T Th 1:10pm - 2:25pm Room TBA |
Eleni Drinea | 3.00 | 3/120 |
COMS W4121 COMPUTER SYSTEMS FOR DATA SCIENCE. 3.00 points.
Prerequisites: CSOR W4246 OR STAT W4203; or equivalent as approved by faculty advisor. background in Computer System Organization and good working knowledge of C/C++
Corequisites: CSOR W4246,STAT GU4203
An introduction to computer architecture and distributed systems with an emphasis on warehouse scale computing systems. Topics will include fundamental tradeoffs in computer systems, hardware and software techniques for exploiting instruction-level parallelism, data-level parallelism and task level parallelism, scheduling, caching, prefetching, network and memory architecture, latency and throughput optimizations, specialization, and an introduction to programming data center computers
COMS W4721 MACHINE LEARNING FOR DATA SCI. 3.00 points.
|
Spring 2026: COMS W4721
|
|||||
| Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
|---|---|---|---|---|---|
| COMS 4721 | 001/11556 | T Th 1:10pm - 2:25pm 142 Uris Hall |
Yining Liu | 3.00 | 97/100 |
| COMS 4721 | 002/12505 | T Th 2:40pm - 3:55pm 142 Uris Hall |
Yining Liu | 3.00 | 106/100 |
| COMS 4721 | C01/17685 | |
Yining Liu | 3.00 | 2/99 |
ENGI E4800 DATA SCIENCE CAPSTONE&ETHICS. 3.00 points.
Not offered during 2024-2025 academic year.
|
Spring 2026: ENGI E4800
|
|||||
| Course Number | Section/Call Number | Times/Location | Instructor | Points | Enrollment |
|---|---|---|---|---|---|
| ENGI 4800 | 001/11786 | F 1:10pm - 3:00pm 633 Seeley W. Mudd Building |
Eleni Drinea | 3.00 | 29/40 |
Electives
Students must complete a minimum of 9 elective credits chosen from advanced data science, AI, machine learning, systems, and application domains. Electives allow students to tailor the program based on interest and career goals.
Electives may be offered through the Data Science Institute or in departments across the University, including Computer Science, Statistics, Industrial Engineering and Operations Research, Business, Public Health, and Economics. Eligible electives must be graduate-level, taken for a letter grade, and approved by the program.
You can view more detailed information here.
