Skip to content

A brief guide to Python's PuLP

PuLP is a Python library for linear programming written in a pythonic way.

Taxonomy and general approach

Linear, Integer and Mixed-integer programming

For a mathematical program to be a linear program you need the following conditions to be true:

  • the decision variables must be real variables
  • the objective must be a linear expression (wrt. the decision variables)
  • the constraints must be linear expressions (wrt. the decision variables)

An integer program is a linear program in which all the decision variables need to have only integer values. Since most integer programs contain a mix of continuous variables and integer variables they are often known as mixed integer programs.

Note

Integer programs can be very difficult problems to solve and there is a lot of current research finding “good” ways to solve integer programs. Integer programs can be solved using the branch-and-bound process. For MIPs of any reasonable size the solution time grows exponentially as the number of integer variables increases.

The Optimisation Process

source: PuLP docs

  1. Getting the problem description
  2. Formulating the (equivalent) mathematical program

    • Identify the Decision Variables paying particular attention to units
    • Formulate the Objective Function using the decision variables, we can construct a minimise or maximise objective function. The objective function typically reflects the total cost, or total profit, for a given value of the decision variables
    • Formulate the Constraints, either logical or explicit to the problem description. Again, the constraints are expressed in terms of the decision variables.
    • Identify the Data needed for the objective function and constraints. To solve your mathematical program you will need to have some "hard numbers" as variable bounds and/or variable coefficients in your objective function and/or constraints
  3. Solving the mathematical program

  4. Performing some post-optimal analysis

    • how the optimal solution would change under various changes to the formulation?
    • what the solution’s variable values mean in terms of the original problem description?

PuLP syntax

Variables definition

To define a set of variables we can declare a dictionary through LpVariable.dicts(var_name, var_index, lowBound, upBound, cat) where:

  • var_name is a string containing the name of the variable
  • var_index contains the indices on which variables will be indexed (and will be used to set keys of the dictionary)
  • lowBound is the lower bound of the domain of the variables
  • upBound is the upper bound of the domain of the variables
  • cat is the category of the variable (Integer, Binary or Continuous)

Example

To define an 10x1 array of binary variables \(x\) we can declare

x = LpVariable.dicts(
    'my_binary_array', 
    range(10), 
    lowBound=0, 
    upBound=1, 
    cat='Binary'
    )

To define \(X\) as a 5x4 matrix of integer variables \(x_{ij}\) such that \(0\leq x_{ij}\leq 7\) we can declare

X = LpVariable.dicts(
    'my_integer_matrix', 
    [(i, j) for i in range(5) for j in range(4)], 
    lowBound=0, 
    upBound=7, 
    cat='Integer'
    )

Problem definition

To define an instance of a given problem we first need to initialize it through problem = LpProblem(problem_name, problem_sense) where:

  • problem_name is a string containing the name of the problem
  • problem_sense can be either LpMinimize (default) or LpMaximize

Example

To define a minimization problem, we can declare

my_problem = LpProblem("The most difficult problem ever", LpMinimize)

To add the objective function into the defined instance, we can use the following syntax problem += objective_function, objective_function_desc where:

  • objective_function is the objective function expressed in terms of decision variables and possibly through the use of PuLP built in classes like lpSum and/or LpAffineExpression
  • objective_function_desc is a string containing the description of the objective function to be minimized/maximized

Example

To add the (possible) function to be minimized \(\sum_{i\in I}a_ix_i\) to our problem we can set

my_problem += lpSum([a[i]*x[i] for i in my_indices]),\
"minimized this affine expression with coefficients a_i and variables x_i"

Constraints

For our problem to be completely defined we must add also the constraints related to the initialized instance using problem += constraint, constraint_desc where:

  • constraint is the expression which implements the constraint to be added possibly via the usage of the built-in class LpConstraint
  • constraint_desc is a string containing the description of the constraint

Example

To add to our problem a constraint in the form \(\sum_{i\in I}x_i\leq b\) we can set

my_problem += lpSum([x[i] for i in my_indices]) <= b

Solution & debug

To write a (verbose) file with the complete representation of the given problem, we can use problem.writeLP("problem_name.lp"). Similarly, to solve the problem and obtain a log with the trace of the optimization process we can use the following code:

my_problem.writeLP("my_problem_name.lp")

from os import dup, dup2, close

logf = open('./optimization_log.txt', 'w')
orig_std_out = dup(1)
dup2(logf.fileno(), 1)

my_problem.solve(pulp.PULP_CBC_CMD(maxSeconds=90, msg=True)) 

dup2(orig_std_out, 1)
close(orig_std_out)
logf.close()

where problem.solve(pulp.PULP_CBC_CMD(maxSeconds=max_seconds, msg=True)) corresponds to the actual call for the solution of the problem, where the default back-end solver has been called with the msg keyword set to True (to actually produce the output log) and the maxSeconds keyword set to 90 seconds to limit the calculation time.

After the resolution, the objective function value can be obtained by calling:

my_problem.objective.value()

In a similar way, we can then check the value(s) assumed by the various decision variables by calling for example:

for i in my_indices:
    print(x[i].value())

To check the status of the problem, both before and after the resolution process, we can use:

LpStatus[my_problem.status]

which passes to the built-in dictionary LpStatus the actual problem status, which can assume one of the following value: Optimal, Not Solved, Infeasible, Unbounded, Undefined.

Tips & Tricks

Dummy is good

After the (mathematical) formulation of the problem and the related PuLP implementation, it's very useful to test the program over several dummy input datasets of growing complexity having a known ground-truth. This helps to identify possible implementation mistakes and/or to unveil limitations in the theoretical formulation.

In doing this, the creation of auxiliary data structures to store decision variables output values is often very useful to not get lost within the (possibly) counterintuitive formulation of the problem (see tip below). The following two functions might be useful in monitor variables values:

def print_variable_values(LpVariable, exception=0):

    '''
    this function can be used to print the values of a given LpVariable
    which do not equal the exception value (default=0)

    INPUT
    LpVariable (dict) --> an LpVariable defined through the related PuLP class
    exception --> the exception value that will cause the variable value not to be printed
    '''

    for key in LpVariable:
        value = LpVariable.get(key).value()
        if value != exception:
            print(key,':',value) 
    return

def filter_variable_values(LpVariable, exception=0):

    '''
    this function can be used to filter the values of a given LpVariable
    which do not equal the exception value (default=0)

    INPUT
    LpVariable (dict) --> an LpVariable defined through the related PuLP class
    exception --> the exception value that will cause the variable value to be filtered out

    OUTPUT
    filtered_values (dict) --> a filtered dictionary which contains only the values of the given variable which do not equal the exception value
    '''

    filtered_values = {}
    for key in LpVariable:
        value = LpVariable.get(key).value()
        if value != exception:
            filtered_values[key] = value 
    return filtered_values

Change the point of view

The most straightforward implementation of a given minimization (resp. maximization) problem might sometimes involve a non-linear function of the decision variables. For example, we can think about a bin-packing problem in which the cost to be minimized depends in a non-linear way on the number of items contained in each bin. In a more detailed way, we are referring to the problem

\(\min\limits_{x}\sum\limits_{j\in\text{bins}} c\left(\sum\limits_{i\in\text{items}}x_{ij}\right)\)

being \(c(\cdot)\) a non-linear function and \(x_{ij}\) a binary variable which indicates if the item \(i\) belongs to the bin \(j\).

In such occasions, the linearity of the problem can be "manually" restored with the introduction of indicator variables defined over the possible values assumed by the above summation. Given for example that \(\sum_{i\in\text{items}}x_{ij}\in\{0,\dots,N\}\), the problem can be written as

\(\min \sum\limits_{j\in\text{bins}}\sum\limits_{n=0}^Nc(n)y_{nj}\)

where \(y_{nj}\) is a binary variable such that

\(y_{nj}=1\Leftrightarrow \sum\limits_{i\in\text{items}}x_{ij}=n\)

subject to the following constraint

\(\sum\limits_{n=0}^Ny_{nj}=1\;\forall j\in\text{bins}\)

The more the better

In the first PuLP implementation, I found useful to write "more" constraints (even if redundant) rather than "less". This forced me to carefully reason about relations between decision variables during the formulation phase, as well as to be sure that I had implemented all the constraints required without missing any (even if in two or more different ways each).

LaTeX is your friend

Before the implementation phase I found extremely useful to write down the full problem formulation in a clean and clear way (for example, with usage of \(\LaTeX\) within Jupyter notebook markdown cells). Again, this forced me to be really sure about the theoretical framework of the problem, making clear all the assumptions and the modelization choices I have made. Moreover, this habit really helps the implementation serving as a to-do-list in terms of constraints to be added within the PuLP problem.

In doing so, I strongly recommend to use a coherent indexing and notation between the theory (markdown cells) and the practice (PuLP code), to avoid transliteration mistakes and help your System 1 in doing its job.

Linearize, linearize, linearize

If you desire (or must) stick to PuLP, you want every non-linear part of your formulation to disappear. In doing so, be confident and optimistic that many non-linear formulations can be properly linearized! For example:

In a similar way, you can bypass some PuLP limitations in the implementation. For example:

  • to implement absolute values (refused by PuLP by default through a raise Error) you can refer to this as well as this
  • to implement strict inequalities (refused by PuLP by default through a raise Error), if the involved variables can assume only integer values you can follow this tip.

Resources

General

CBC optimal status

I discover that CBC solver used via PuLP sometimes exits with OPTIMAL status even if the actual cause is the given time limit: this lead to an incoherent solver status with a globally non-optimum solution.

Below some possible explanations.

Following this issue, one of PuLP authors (at least from v2.1 on) states:

In the case of GUROBI and CPLEX solvers (at least in the CMD interface), pulp returns 1 ('optimal') when an integer solution has been found in the time limit.

This is not the behavior I get from the CBC solver, which returns 0 ['not solved', ndr].

Since having found an integer solution when the time stops is neither "optimal" nor "not solved", both are at the same time misleading and inconsistent.

One possibility is to at least make all solvers return the same status when finishing in the same status. This would make CBC return 1 instead of 0 when at least one integer > solution was found. This is easy to change.

The second, more ambitious possibility, would be to create a new "Integer solution" status in pulp that differentiates from "optimal" and "no solution". This would then be used in the solver interfaces as an additional status to return.

A double check on PuLP constants confirms that:

  • exists a dictionary of PuLP problem statuses with values in [Not Solved, Optimal, Infeasible, Unbounded, Undefined]
  • exists a dictionary of PuLP solution statuses with values in [No Solution Found, Optimal Solution Found, Solution Found, No Solution Exists, Solution is Unbounded]
  • LpSolutionIntegerFeasible = 2 has been mapped to solution status Solution Found but it's not the map value for any problem status in LpStatusToSolution

Looking into the source code of PuLP interface to CBC, we can check the get_status function definition:

def get_status(self, filename):
    cbcStatus = {
        "Optimal": constants.LpStatusOptimal,
        "Infeasible": constants.LpStatusInfeasible,
        "Integer": constants.LpStatusInfeasible,
        "Unbounded": constants.LpStatusUnbounded,
        "Stopped": constants.LpStatusNotSolved,
    }

    cbcSolStatus = {
        "Optimal": constants.LpSolutionOptimal,
        "Infeasible": constants.LpSolutionInfeasible,
        "Unbounded": constants.LpSolutionUnbounded,
        "Stopped": constants.LpSolutionNoSolutionFound,
    }

    with open(filename) as f:
        statusstrs = f.readline().split()

    status = cbcStatus.get(statusstrs[0], constants.LpStatusUndefined)
    sol_status = cbcSolStatus.get(
        statusstrs[0], constants.LpSolutionNoSolutionFound
    )
    # here we could use some regex expression.
    # Not sure what's more desirable
    if status == constants.LpStatusNotSolved and len(statusstrs) >= 5:
        if statusstrs[4] == "objective":
            status = constants.LpStatusOptimal
            sol_status = constants.LpSolutionIntegerFeasible
    return status, sol_status

In one of the last lines we can clearly see that if the problem status is LpStatusNotSolved, the same is reassigned to LpSolutionOptimal but solution status is not (it is override with LpSolutionIntegerFeasible).