Exploring Mathematical Optimization: A New Tool for Data Scientists
Written on
Chapter 1: Introduction to Optimization
In the current landscape of Data Science, there seems to be an overemphasis on machine learning techniques. It's akin to the old adage: to someone wielding a hammer, everything looks like a nail. Consequently, modern Data Scientists often view every challenge through the lens of machine learning. This singular focus is unfortunate because it overlooks a plethora of valuable approaches within the data science domain.
This article aims to shed light on an essential yet often neglected aspect of Data Science: Mathematical Optimization, particularly Constraint Programming. By incorporating these techniques into your skill set, you can significantly enhance your career prospects, even if your mathematical background isn't particularly strong. I, for instance, studied Geography but found it surprisingly accessible to dive into Mathematical Optimization using Google's open-source library, OR-Tools, which will be introduced in this beginner-friendly guide.
If you're eager to broaden your Data Science toolkit and acquire this in-demand skill, let's get started!
Section 1.1: What is Optimization?
Optimization encompasses a range of techniques designed to determine the best possible solution from a vast array of alternatives. This might involve identifying the optimal solution to a problem or simply listing all feasible solutions. Consider a scenario where you're part of the Data Science team at an Amazon distribution center. You have 100 packages to deliver and three drivers, all needing to complete their deliveries within a two-hour timeframe. This presents an optimization problem, necessitating the creation of the most efficient delivery schedule for each driver.
Alternatively, think about a teacher planning a group activity for ten students. The teacher needs to divide the students into three groups but faces specific constraints: (1) Timmy cannot be grouped with Jimmy, (2) Billy must be in the same group as Willy, and (3) each group must include at least one of Mickey, Dickie, Ricky, or Vicky. This represents a constraint programming problem, where the objective is to find a viable grouping that adheres to all established constraints.
These scenarios highlight the nature of optimization problems, where the goal is to sift through an immense number of potential solutions to find the most viable or optimal one.
Section 1.2: Understanding Constraint Programming
To illustrate the concept, let’s delve into the student grouping example discussed earlier. We’ll work with the following conditions:
- Students 1 and 2 cannot be in the same group.
- Student 3 must be grouped with Student 8.
- Each group must consist of at least one of the following students: 7, 8, 9, or 10.
Additionally, we have standard constraints:
- Each group must contain a minimum of three students.
- Each student must be assigned to exactly one group.
These constraints create a complex situation that is challenging to resolve through mental calculations alone. Fortunately, Constraint Programming can help us derive a possible solution using OR-Tools, an open-source library developed by Google.
Chapter 2: Implementing OR-Tools for Optimization
To begin, install the OR-Tools Python library with the following command:
pip install ortools
Next, import the constraint programming module and initialize a CP Model:
from ortools.sat.python import cp_model
model = cp_model.CpModel()
Now, let's add decision variables and constraints to our model. Each student must be assigned to one of the three groups. We can express this as a series of linear equations.
The first constraint can be represented as follows:
student1_group1 = model.NewBoolVar("student1_group1")
student1_group2 = model.NewBoolVar("student1_group2")
student1_group3 = model.NewBoolVar("student1_group3")
model.Add(student1_group1 + student1_group2 + student1_group3 == 1)
This implies that if Student 1 is assigned to Group 1 (i.e., student1_group1 = 1), then the other two variables must be 0.
The implementation continues with additional constraints for all students, ensuring that each constraint is articulated as a linear equation to maintain clarity.
The video, "Predict The Stock Market With Machine Learning And Python," serves as an illustrative example of applying machine learning techniques, while this article highlights the complementary role of optimization.
Section 2.1: Adding More Constraints
Once the decision variables are established, we can integrate additional constraints, such as ensuring each group has a minimum of three members. This can be expressed through the following equations:
model.Add(student1_group1 + student2_group1 + student3_group1 + student4_group1 + student5_group1 + student6_group1 + student7_group1 + student8_group1 + student9_group1 + student10_group1 >= 3)
We will replicate this for each group to guarantee compliance with the minimum requirement.
Section 2.2: Solving the Model
With all constraints and decision variables in place, we can instruct OR-Tools to solve the problem:
solver = cp_model.CpSolver()
status = solver.Solve(model)
If a solution exists, we can print the group assignments for each student.
The output might look something like this:
Student 1: Group 1
Student 2: Group 2
...
Conclusion: The Value of Optimization
Optimization techniques are not new, yet they often remain underutilized in Data Science compared to analytics and machine learning. I believe it’s crucial for Data Scientists to embrace these methodologies. I hope you found this overview useful, and I'm eager to hear about any innovative use cases you might have!
For further insights, I also manage the SQLgym and publish a free newsletter titled "AI in Five," where I share weekly updates on AI developments, coding strategies, and career advice for Data Scientists and Analysts. Subscribe if you're interested!
Thank you for reading, and I welcome you to connect with me on Twitter or LinkedIn! 😊