My current research interests are Large-Scale Optimization, Machine Learning, Transportation, Data Analytics and applications to Public Policy.

Click on the papers to expand and get more information.

Published Work

Boston Public Schools

Abstract: In the winter of 2016, Boston Public Schools (BPS) launched a crowdsourcing national competition to create a better way to construct bus routes to improve efficiency, deepen the ability to model policy changes, and realign school start times. The winning team came from the Massachusetts Institute of Technology (MIT). They developed an algorithm to construct school bus routes, by assigning students to stops, combining stops into routes, and optimally assigning vehicles to routes. BPS has used this algorithm for two years running; in the summer of 2017, this led to a 7% reduction in the BPS bus fleet. Bus routing optimization also gives BPS the unprecedented ability to understand the financial impact of new policies that affect transportation. In particular, the MIT research team developed a new mathematical model to select start times for all schools in the district in a way that takes transportation into account. Using this methodology, BPS proposed a solution that would have saved an additional $12M annually, and shifted students to more developmentally appropriate school start times (by reducing the number of high school students starting before 8:00 a.m. from 74% to 6% and the average number of elementary school students dismissed after 4:00 pm from 33% to 15%). However, 85% of the schools’ start times would have been changed, with a median change of one hour. This magnitude of change was too dramatic for the district to absorb, as evidenced by strong vocal opposition from some of the school communities who were negatively affected, and the plan was not implemented.

D. Bertsimas, A. Delarue, S. Martin. Optimizing schools' start time and bus routes (2019) Proceedings of the National Academy of Sciences (PNAS) March 26, 2019 116 (13) 5943-5948
bookmark Franz Edelman Award Finalist (2019)
bookmark MIT ORC Best Student Paper (2018)
bookmark 2nd place of Doing Good with Good OR INFORMS award (2018)
Bus routes in Boston

Abstract: Maintaining a fleet of buses to transport students to school is a major expense for U.S. school districts. In order to reduce transportation costs by allowing buses to be reused between schools, many districts spread start times across the morning. However, assigning each school a time involves both estimating the impact on transportation costs and reconciling additional competing objectives. Facing this intricate optimization problem, school districts must resort to ad hoc approaches, which are often expensive, inequitable, and even detrimental to student health. For example, medical evidence overwhelmingly indicates that early high school starts are severely impacting the development of an entire generation of students and constitute a major public health crisis. We present the first algorithm to jointly optimize school bus routing and bell time assignment. Our method leverages a natural decomposition of the routing problem and uses mixed-integer optimization to compute and combine subproblem solutions. Our algorithm significantly outperforms state-of-the-art school bus routing methods. The routing engine can be used to formulate a tractable proxy to transportation costs, which allows the formulation of the bell time assignment problem as a multi-objective Generalized Quadratic Assignment Problem. Local search methods provide high-quality solutions, allowing school districts to explore the tradeoffs between competing priorities and choose the solution that best suits the needs of the community. Our application in Boston led to $5 million in yearly savings (maintaining service quality despite a 50-bus fleet reduction) and to the unanimous approval of the first school start time reform in thirty years.

Travel time estimation in NYC

Abstract: Twenty-first century urban planners have identified the understanding of complex city traffic patterns as a major priority, leading to a sharp increase in the amount and the diversity of traffic data being collected. For instance, taxi companies in an increasing number of major cities have started recording metadata for every individual car ride. In this paper, we show that we can leverage network optimization insights to extract accurate travel time estimations from such origin-destination data, using information from a large number of taxi trips to reconstruct the traffic patterns in an entire city on a variety of timescales. We develop a method that tractably exploits origin-destination data, and draws from its optimization framework the flexibility needed to take advantage of other sources of traffic information. Using synthetic data, we establish the robustness of our algorithm to uncertainty, and display its ability to significantly reduce input variance. We show empirically that our algorithm can leverage any available amount of data, even in a high-variance environment, to provide insights about urban traffic patterns on different scales, leading to accurate travel time estimations throughout the network.

Taxi routing in NYC

Abstract: With the emergence of ride-sharing companies that offer transportation on demand at a large scale and the increasing availability of corresponding demand datasets, new challenges arise to develop routing optimization algorithms that can solve massive problems in real time. In this paper, we develop an optimization framework, coupled with a novel and generalizable backbone algorithm, that allows us to dispatch in real time thousands of taxis serving more than 25,000 customers per hour. We provide evidence from historical simulations using New York City routing network and yellow cabs data to show that our algorithms improve upon the performance of existing heuristics in such real-world settings.

Cal logo reproduced with congestion patterns

Abstract: This article presents a study on freeway networks instrumented with coordinated ramp metering and the ability of such control systems to produce arbitrarily complex congestion patterns within the dynamical limits of the traffic system. The developed method is used to evaluate the potential for an adversary with access to control infrastructure to enact high-level attacks on the underlying freeway system. The attacks are executed using a predictive, coordinated ramp metering controller based on finite-horizon optimal control and multi-objective optimization techniques. The efficacy of the control schemes in carrying out the prescribed attacks is determined via simulations of traffic network models based on the cell transmission model with onramps modeled as queue buffers. Freeway attacks with high-level objectives are presented on two illustrative examples: congestion-on-demand, which aims to create precise, user-specified pockets of congestion, and catch-me-if-you-can, which attempts to aid a fleeing vehicle from pursuant vehicles.


Shared rides in Manhattan.

Abstract: Detours are considered key for the efficient operation of a shared rides service, but are also the major pain point for consumers of such services. This paper studies the relationship between the value generated by shared rides and the detours they create for riders. We establish a fundamental limit on the sum of value and detour, and prove this leads to a tight bound on the Pareto frontier of values and detours in a general setting with an arbitrary number requests. We explicitly compute the Pareto frontier for one family of city topologies, and construct it via simulation for several more networks, including one based on ridesharing data from commute hours in Manhattan. We find that average detours are surprisingly small even in low demand density settings. We also find that by carefully choosing the match objective, detours can be significantly reduced with little impact on value, and that the density of ride requests is far more important than detours for the effective operations of a shared rides service. In response to these findings, we propose that platforms implement a two-product version of a shared rides service.

D. Bertsimas, A. Delarue, P. Jaillet, S. Martin. The Price of Interpretability (2019) arXiv:1907.03419
bookmark INFORMS Data Mining Section Best Student Paper Finalist (2019)
Examples of incremental model spaces.

Abstract: When quantitative models are used to support decision-making on complex and important topics, understanding a model's ``reasoning'' can increase trust in its predictions, expose hidden biases, or reduce vulnerability to adversarial attacks. However, the concept of interpretability remains loosely defined and application-specific. In this paper, we introduce a mathematical framework in which machine learning models are constructed in a sequence of interpretable steps. We show that for a variety of models, a natural choice of interpretable steps recovers standard interpretability proxies (e.g., sparsity in linear models). We then generalize these proxies to yield a parametrized family of consistent measures of model interpretability. This formal definition allows us to quantify the ``price'' of interpretability, i.e., the tradeoff with predictive accuracy. We demonstrate practical algorithms to apply our framework on real and synthetic datasets.

Coordinate path illustration.

Abstract: When predictive models are used to support complex and important decisions, the ability to explain a model's reasoning can increase trust, expose hidden biases, and reduce vulnerability to adversarial attacks. However, attempts at interpreting models are often ad hoc and application-specific, and the concept of interpretability itself is not well-defined. We propose a general optimization framework to create explanations for linear models. Our methodology decomposes a linear model into a sequence of models of increasing complexity using coordinate updates on the coefficients. Computing this decomposition optimally is a difficult optimization problem for which we propose exact algorithms and scalable heuristics. By solving this problem, we can derive a parametrized family of interpretability metrics for linear models that generalizes typical proxies, and study the tradeoff between interpretability and predictive accuracy.


PhD thesis: The Edge of Large-Scale Optimization in Transportation and Machine Learning
bookmark George B. Dantzig Dissertation Award (2019)
bookmark TSL Dissertation Prize (2019)
This thesis focuses on impactful applications of large-scale optimization in transportation and machine learning. Using both theory and computational experiments, we introduce novel optimization algorithms to overcome the tractability issues that arise in real world applications. We work towards the implementation of these algorithms, through software contributions, public policy work, and a formal study of machine learning interpretability. Our implementation in Boston Public Schools generates millions of dollars in yearly transportation savings and led to important public policy consequences in the United States. This work is motivated by large-scale transportation problems that present significant optimization challenges. In particular, we study the problem of ride-sharing, the online routing of hundreds of thousands of customers every day in New York City. We also contribute to travel time estimation from origin-destination data, on city routing networks with tens of thousands of roads. We additionally consider the problem of school transportation, the scheduling of hundreds of buses to send tens of thousands of children to school everyday. This transportation problem is related to the choice of school start times, for which we also propose an optimization framework. Building on these applications, we present methodological contributions in large-scale optimization. We introduce state-of-the-art algorithms for scheduling problems with time-window (backbone) and for school bus routing (BiRD). Our work on travel time estimation tractably produces solutions to the inverse shortest path length problem, solving a sequence of second order cone problems. We also present a theoretical and empirical study of the stochastic proximal point algorithm, an alternative to stochastic gradient methods (the de-facto algorithm for large-scale learning). We also aim at the implementation of these algorithms, through software contributions, public policy work (together with stakeholders and journalists), and a collaboration with the city of Boston. Explaining complex algorithms to decision-makers is a difficult task, therefore we introduce an optimization framework to decomposes models into a sequence of simple building blocks. This allows us to introduce formal measure of the "interpretability" of a large class of machine learning models, and to study tradeoffs between this measure and model performance, the price of interpretability.