Certificate in Data Science

The Certificate in Data Science provides you with the knowledge to draw conclusions on data reliably and robustly. 

Courses in Computer Science, Statistics and Management will teach you to implement an array of data science/statistics methods, models, and tools in any business. 

Program overview

Term 1 2 foundation courses
+
1 elective or certificate course 
Term 2 1 required course
+
2 elective or certificate courses
Term 3 DGIN 7000: Internship (internship students)
or
DGIN 9000: Master’s Thesis (thesis students) 
Term 4  DGIN 5001: Capstone (internship students)
or 
DGIN 5002: Research Methods (thesis students)
+
2 elective or certificate courses

Customize your degree

Although all MDI students follow a required program outline, you can create your own degree with a wide variety of elective options that support the Certificate in Data Science.

One of the following courses:

Assigned by the graduate committee based on academic background and goals.

DGIN 5100 Foundations in Web Technologies

This hands-on course examines the technologies and infrastructure required to support digital innovation.  The course examines the major components of the information technology infrastructure, such as networks, databases and data warehouses, electronic payment, security, and human-computer interfaces.  The course covers key web concepts and skills for designing, creating and maintaining websites, such as Grid Theory, HTML5, CSS, JavaScript, AJAX theory, PHP, SQL and NoSQL databases.  Other principles such as Web Accessibility, Usability and User eXperience, as well as best security practices, are explored in detail through a combination of lectures, in-class examples, individual lab work and assignments, and a final group project.

DGIN 5200 Foundations in Business

The overall aim of this course is to develop a high-level understanding of the dynamics of innovation, the distribution and outcomes of the strategic management of innovation and the relationships that are important in developing high-impact organizations. 

DGIN 5300 Law, Policy, and Ethics in Emerging Technologies

Emerging technologies—such as digital media, the “internet of things”, artificial intelligence (AI), and financial tech—are playing an increasingly central role in how individuals live and interact with each other; how businesses innovate and create new opportunities; and how governments function and serve their populations. But the unrestrained development and use of these technologies can raise complex legal, policy, and ethical challenges. This course offers students an introduction to foundational legal, policy, and ethical issues raised by emerging technologies in a variety of contexts, with special consideration for digital innovation and commerce. On completion, students will be able to better identify, understand, and critically assess these issues and also more effectively manage and resolve them in the course of the professional pursuits.

DGIN 5400 Statistics for Health Informatics

This course covers essential statistical methods for medical research. Topics include descriptive analysis techniques and basic principles of statistical inference for comparison of means, proportions and investigation of relationships between variables using regression mod-eling techniques. Students will also become familiar with nonparametric tests and power and sample size calculations.


One core course:

DGIN 5201 Digital Transformation

This core digital innovation course focuses on the design and management of digital innovation projects for both public sector and private sector organizations. Specifically, this course provides students with knowledge and skills to initiate and execute digital innovation and transformation projects in existing organizations or new start-ups.


The following certificate courses:

STAT 5620 Data Analysis

This course begins with a thorough description of the multi-disciplinary field of data science, making clear the role of statistics therein. Issues surrounding data ethics and reproducibility will then be discussed followed by an extensive review of tools for exploratory data analysis (EDA). Statistical models will be described commencing with linear models (LMs) and generalized linear models (GLMs). Next, additive and generalized additive models (GAMs) will be introduced followed by their mixed model extensions. Tree-based methods, longitudinal models and spatial statistics will be demonstrated with a view to completing their ​statistical toolbox​. Emphasis will be placed on understanding model assumptions and method implementation. Real and relevant data sets will be used throughout the course to demonstrate best practices for data analysis. The R programming language will be used exclusively.

CSCI 6409 The Process of Data Science

The advent of low-cost storage and processing power coupled with ever increasing amounts of "born digital" data has created the new field of data science. The ability to achieve a specific goal or answer a business question by crunching through very large and complex databases is becoming a competitive advantage for businesses and leads to new discoveries in science and medicine. This course is an overview of the different processes that make up a data science project. While other fields concentrate on finding previously unknown knowledge or searching for a specific pattern, data science focuses on answering deep questions and making the conclusions accessible to the rest of the organization. This course requires the implementation of software and experimental design in order to complete the assignments.

CSCI 6505 Machine Learning

Machine Learning is the area of Artificial Intelligence concerned with the problem of building computer programs that automatically improve with experience. The intent of this course is to present a broad introduction to the principles and paradigms underlying machine learning, including discussions of each of the major approaches currently being investigated. Main topics covered in the course include a review of information theory, unsupervised learning or clustering (the K-means family, co-clustering, mixture models and the EM algorithm), supervised learning or classification (support vector machines, decision trees, rule learning, Bayesian learners, maximum entropy, ensemble methods), feature selection and feature transformations. The focus of applications that will be discussed will be text classification and clustering.

 

Your choice of two elective courses from the following:

BUSI/INFO 6513 Business Analytics and Data Visualization

This course provides an introduction to Business Analytics and Data Visualization. It covers the processes, methodologies and practices used to transform the large amounts of business and public data into useful information to support business decision-making. Students will learn how to extract and manipulate data from these systems. They will also acquire basic knowledge of data mining and statistical analysis, with a focus on data visualization. The students will also learn to build and use management dashboards and balanced scorecards using a variety of data design and visualization tools. The course will be made up of a combination of conceptual and applied topics with classes being held in a computer lab. Technologies to be used will be focused on end-user analytics and data visualization and will include state of the art tools for self-serve business analytics.

INFO 6681 Geospatial Information Management

Spatial Information is the air and water that makes mapping and spatial analysis possible. Mobile applications using maps are some of the most popular and often used web-based applications; they are also cloud based which added another layer of management issues. Maps, GIS and the use of spatial information have never been more popular or public. This course addresses the effective management of spatial information. The course covers principles and practices associated with metadata, GIS, licensing, spatial information databases, map libraries and archives, spatial data infrastructures and web-based delivery of products and services, as well as distributed systems such as geolibraries, ‘digital earth’ and the development of the 'spatial cloud'. This course is geared towards the manager who seeks to deploy services associated with spatial information and effectively develop an enterprise approach to managing spatial information. The course will also provide hands-on experience in using GIS and related technologies so as to be able to better understand how to deploy services, especially over the web.

STAT 5130 Bayesian Data Analysis

Stat 5130 is intended to make advanced Bayesian methods genuinely accessible to graduate students. The course covers all the fundamental concepts of Bayesian methods, and works from the simplest ideas (characterizations of probability; comparative inference; prior, posterior and predictive distributions) up through hierarchical modes applied to various data. Computational methods include MCMC for posterior simulation.

STAT 5350 Applied Multivariate Analysis

This course deals with the stochastic behaviour of several variables in systems where their interdependence is the object of analysis. Greater emphasis is placed on a practical application than on mathematical refinement. Topics include classification, cluster analysis, categorized data, analysis of interdependence, structural simplification by transformation or modelling and hypothesis construction and testing.

STAT 5390 Time Series Analysis

Time series analysis in both the time and frequency domain is introduced. The course is applied and students are required to develop their own computer programs in the analysis of time series drawn from real problems. Topics to be discussed include the nature of time series, stationarity, auto and cross covariance functions, the Box-Jenkins approach to model identification and fitting, power and cross spectra and the analysis of linear time-invariant relationships between pairs of series.

STAT 5550 Longitudinal Data Analysis

This course is concerned with statistical techniques for analysis of longitudinal data, data that are collected repeatedly over a time on a number of subjects. Topics include generalized estimating equations; fixed, random and mixed effects linear models; generalized linear models; diagnostics and model checking; as well as missing data issues.

CSCI 6405 Data Mining and Data Warehousing

This course gives a basic exposition of the goals and methods of data mining and data warehouses, including concepts, principles, architectures, algorithms, implementations, and applications. The main topics include an overview of databases, data warehouses and data mining technology, data warehousing and on line analytical process (OLAP), concept mining, association mining, classification and predication, and clustering. Software tools for data mining and data warehousing and their design will also be introduced.

CSCI 6406 Visualization

This course focuses on graphical techniques for data visualization that assist in the extraction of meaning from datasets. This involves the design and development of efficient tools for the exploration of large and often complex information domains. Applications of visualization are broad, including computer science, geography, the social sciences, mathematics, science and medicine, as well as architecture and design. The course will cover all aspects of visualization including fundamental concepts, algorithms, data structures, and the role of human perception.

CSCI 6509 Advanced Topics in Natural Language Processing

Natural Language Processing (NLP) is an area of Artificial Intelligence concerned with the problem of automatically analyzing and generating a natural language, such as English, French, or other, in written or spoken form. It is a relatively old area of computer science, but it is still a very active research area. This course introduces fundamental concepts and principles used in NLP with emphasis on statistical approaches to NLP and unification-based grammars. In the application part of the course, we discuss the problems of question answering, machine translation, text classification, information extraction, grammar induction, and dictionary generation and other.

CSCI 6515 Machine Learning for Big Data

In this course, we will focus on Big Data and the Pillars of that emerging discipline: machine learning/data mining, elements of high-performance computing, and data visualization. Significant part of the course will be devoted to selected, efficient methods for building models from large datasets data using machine learning techniques.

CSCI 6612 Visual Analytics

This course will introduce the concepts of Visual Analytics (VA). VA is a multi-disciplinary domain that combines data visualization with machine learning and other automated techniques to help people make sense of data. Students will be introduced to the design of visual representations supporting tasks to go from findings to insights based on data. Topics include basic concepts of information visualization and machine learning; visual analytics of evolving phenomena; analysis of spatial and temporal data sets; visual social media analytics; and the visual analytics of text and multimedia collections. Students will prototype visual analytics applications using existing toolkits, coupling machine learning and visualization methods. Students will gain competence in performing data analysis and visualization tasks in different application domains.
NOTES: Students must be proficient in at least one or multiple programming languages that support the design of interactive visual interfaces and the execution of data mining/machine learning libraries and toolkits.

DGIN 5401 Operationalized Machine Learning in Healthcare

This course provides a broad overview of machine learning and machine learning operations in healthcare contexts. We begin by studying how healthcare data is unique, and how machine learning methods have been applied to clinical and medical tasks. We focus on various graphical, deep learning, time-series, and transfer learning models and unique aspects of their application in healthcare. We cover concepts of fairness, privacy, trust, explainability, and other human factors. We discuss implementation techniques, including ‘MLOps’ for healthcare, and opportunities for real-world deployment. Much of the course will be seminar-based, including guest lectures and descriptions of research papers. Students will choose and complete a commensurate research project. The course expects and requires a familiarity with programming and core concepts in data mining or data science. It is strongly recommended that Master of Digital Innovation students take this in their final semester.