Graduate Programs

Master of Science in Data Science (MSDS)

The Master of Science in Data Science is an interdisciplinary program administered by the Columbia University Data Science Institute (DSI) and jointly offered in collaboration with Columbia Engineering’s Departments of Computer Science and Industrial Engineering and Operations Research and the Graduate School of Arts and Sciences Department of Statistics. Graduates earn a Master of Science degree from Columbia Engineering.

The program prepares students to design, analyze, and deploy data-driven systems across scientific, engineering, and societal domains. Combining rigorous theoretical foundations with applied training, the MSDS curriculum reflects current and emerging advances in machine learning, artificial intelligence, statistical modeling, scalable systems, optimization, and computational methods.

Instruction is provided by world-class Columbia faculty and industry practitioners, and students engage in coursework and applied experiences that emphasize both methodological depth and real-world implementation.

 

Garud Iyengar

Avanessians Director of the Data Science Institute

Nakul Verman

MS Data Science Program Director

Eleni Drinea 

Senior Lecturer in the Discipline of Computer Science

Daniel Hsu

Associate Professor of Computer Science

Yining Liu

Lecturer in the Discipline of Data Science

Dobrin Marchev

Lecturer in the Discipline of Statistics

Joyce Robbins 

Lecturer in the Discipline of Statistics

Daniel Bauer

Senior Lecturer in the Computer Science

Hardeep Johar

Teaching Professor of Industrial Engineering and Operations Research

Kaizhang Wang 

Assistant Professor of Industrial Engineering and Operations Research

Curriculum Requirements

The Master of Science in Data Science requires the successful completion of 30 points of graduate coursework, including core courses, electives, and an integrated capstone project, as specified below.

Students enter the program with diverse academic backgrounds. Prior coursework in areas such as statistics, computer science, or mathematics may allow students to waive or test out of certain core requirements; waivers are evaluated on a case-by-case basis and must be reviewed and approved by the program faculty. Students who are approved for a waiver should work with the MSDS academic advising team to identify an appropriate replacement course.

M.S. students must also complete the professional development and leadership course, ENGI E4000 PROF DEVELOPMENT&LEADERSHIP, as a graduation requirement. 

Core Curriculum

The core curriculum (21 credits) provides a comprehensive foundation in the mathematical, statistical, computational, and systems components of modern data science. Required courses include:

STAT GR5701 PROBABILITY & STAT FOR DATA SC. 3.00 points.

This course covers the following topics: Fundamentals of probability theory and statistical inference used in data science; Probabilistic models, random variables, useful distributions, expectations, law of large numbers, central limit theorem; Statistical inference; point and confidence interval estimation, hypothesis tests, linear regression

Fall 2026: STAT GR5701
Course Number Section/Call Number Times/Location Instructor Points Enrollment
STAT 5701 001/14642 M W 6:10pm - 7:25pm
Room TBA
Dobrin Marchev 3.00 0/125
STAT 5701 002/14643 T Th 4:10pm - 5:25pm
Room TBA
Dobrin Marchev 3.00 2/125

STAT GR5702 EXPLORATORY DATA ANALYSIS/VISUAL. 3.00 points.

This course is covers the following topics: fundamentals of data visualization, layered grammer of graphics, perception of discrete and continuous variables, intreoduction to Mondran, mosaic pots, parallel coordinate plots, introduction to ggobi, linked pots, brushing, dynamic graphics, model visualization, clustering and classification

Fall 2026: STAT GR5702
Course Number Section/Call Number Times/Location Instructor Points Enrollment
STAT 5702 001/14644 M W 4:10pm - 5:25pm
Room TBA
Joyce Robbins 3.00 0/115
STAT 5702 002/14645 T Th 4:10pm - 5:25pm
Room TBA
Joyce Robbins 3.00 0/115

STAT GR5703 STAT INFERENCE & MODELING. 3.00 points.

Prerequisites: (STAT GR5701) working knowledge of calculus and linear algebra (vectors and matrices), and STAT GR5203 or the equivalent.
Prerequisites: (STAT GR5701) working knowledge of calculus and linear algebra (vectors and matrices), STAT GR5701 or equivalent, and familiarity with a programming language (e.g. R, Python) for statistical data analysis. In this course, we will systematically cover fundamentals of statistical inference and modeling, with special attention to models and methods that address practical data issues. The course will be focused on inference and modeling approaches such as the EM algorithm, MCMC methods and Bayesian modeling, linear regression models, generalized linear regression models, nonparametric regressions, and statistical computing. In addition, the course will provide introduction to statistical methods and modeling that addresses various practical issues such as design of experiments, analysis of time-dependent data, missing values, etc. Throughpout the course, real-data examples will be used in lecture discussion and homework problems. This course lays the statistical foundation for inference and modeling using data, preparing the MS in Data Science students, for other courses in machine learning, data mining and visualization

Spring 2026: STAT GR5703
Course Number Section/Call Number Times/Location Instructor Points Enrollment
STAT 5703 001/14271 T Th 5:40pm - 6:55pm
301 Uris Hall
Dobrin Marchev 3.00 216/250
Fall 2026: STAT GR5703
Course Number Section/Call Number Times/Location Instructor Points Enrollment
STAT 5703 001/14646 T Th 6:10pm - 7:25pm
Room TBA
Dobrin Marchev 3.00 11/50

CSOR W4246 ALGORITHMS FOR DATA SCIENCE. 3.00 points.

Prerequisites: COMS W1007 Basic knowledge in programming (e.g. at the level of COMS W1007), a basic grounding in calculus and linear algebra.
Corequisites: COMS W4121

Methods for organizing data, e.g. hashing, trees, queues, lists,priority queues. Streaming algorithms for computing statistics on the data. Sorting and searching. Basic graph models and algorithms for searching, shortest paths, and matching. Dynamic programming. Linear and convex programming. Floating point arithmetic, stability of numerical algorithms, Eigenvalues, singular values, PCA, gradient descent, stochastic gradient descent, and block coordinate descent. Conjugate gradient, Newton and quasi-Newton methods. Large scale applications from signal processing, collaborative filtering, recommendations systems, etc.

Fall 2026: CSOR W4246
Course Number Section/Call Number Times/Location Instructor Points Enrollment
CSOR 4246 001/12497 T Th 11:40am - 12:55pm
Room TBA
Eleni Drinea 3.00 0/120
CSOR 4246 002/12498 T Th 1:10pm - 2:25pm
Room TBA
Eleni Drinea 3.00 3/120

COMS W4121 COMPUTER SYSTEMS FOR DATA SCIENCE. 3.00 points.

Prerequisites: CSOR W4246 OR STAT W4203; or equivalent as approved by faculty advisor. background in Computer System Organization and good working knowledge of C/C++
Corequisites: CSOR W4246,STAT GU4203
An introduction to computer architecture and distributed systems with an emphasis on warehouse scale computing systems. Topics will include fundamental tradeoffs in computer systems, hardware and software techniques for exploiting instruction-level parallelism, data-level parallelism and task level parallelism, scheduling, caching, prefetching, network and memory architecture, latency and throughput optimizations, specialization, and an introduction to programming data center computers

COMS W4721 MACHINE LEARNING FOR DATA SCI. 3.00 points.

Spring 2026: COMS W4721
Course Number Section/Call Number Times/Location Instructor Points Enrollment
COMS 4721 001/11556 T Th 1:10pm - 2:25pm
142 Uris Hall
Yining Liu 3.00 97/100
COMS 4721 002/12505 T Th 2:40pm - 3:55pm
142 Uris Hall
Yining Liu 3.00 106/100
COMS 4721 C01/17685  
Yining Liu 3.00 2/99

ENGI E4800 DATA SCIENCE CAPSTONE&ETHICS. 3.00 points.

Not offered during 2024-2025 academic year.

Spring 2026: ENGI E4800
Course Number Section/Call Number Times/Location Instructor Points Enrollment
ENGI 4800 001/11786 F 1:10pm - 3:00pm
633 Seeley W. Mudd Building
Eleni Drinea 3.00 29/40

Electives

Students must complete a minimum of 9 elective credits chosen from advanced data science, AI, machine learning, systems, and application domains. Electives allow students to tailor the program based on interest and career goals.

Electives may be offered through the Data Science Institute or in departments across the University, including Computer Science, Statistics, Industrial Engineering and Operations Research, Business, Public Health, and Economics. Eligible electives must be graduate-level, taken for a letter grade, and approved by the program.

You can view more detailed information here.