Solve For Y on Solve For Y
/
Recent content in Solve For Y on Solve For Y
Hugo  gohugo.io
enus
© 2018
Tue, 27 Mar 2018 00:00:00 +0000

MLtoolkit
/project/mltoolkit/
Fri, 07 Dec 2018 00:00:00 +0000
/project/mltoolkit/

Exploring Leverage in Multivariable Linear Regression
/post/exploringleverageinmultivariablelinearregression/
Sun, 02 Sep 2018 00:00:00 +0000
/post/exploringleverageinmultivariablelinearregression/
<p>In the context of multivariable linear regression, leverage is a distance measure that shows how far an observation is from the center of the multivariate predictor space. Observations with high leverage values would have the <strong>potential</strong> to influence the regression model highly while observations with low leverage values would not. Additionally, leverage can be used to determine if a new observation is close to the predictor space of the observations used to create the model in order to avoid extrapolation.</p>
<p>Once a linear regression model is built, we can get the leverage values of all the observations used in the model from the diagonal values of its hat matrix. However, a key point to remember is that leverage as a measure of distance is not Euclidean, so we shouldn’t expect the distance to be a straight line to the center of the multivariate predictor space. Instead, leverage takes into consideration the predictor correlations and is therefore able to detect multivariate outliers — which are outliers in the full predictor space taken together though they may not be outliers for any predictors individually!</p>
<p>The following example demonstrates the idea. Let’s say we have 17 labeled observations in the predictor space consisting of 2 predictors <span class="math inline">\(x_1\)</span> and <span class="math inline">\(x_2\)</span>. The below scatter plot shows the predictor space:</p>
<pre class="r"><code>library(tidyverse)
theme_set(theme_bw())
df <tibble(x1=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,11,25),
x2=c(2,2.5,3.2,4.6,5.3,6.4,7.9,8.4,9.3,10.2,11.5,12.3,13.7,14.1,15.6,4,26),
y=c(1,3,2,4.3,5.3,7.3,6.1,8.8,9.3,10.9,11.1,12.5,13.2,14.9,15.8,5,25.4),
obs=1:17)
ggplot(df, aes(x1, x2)) + geom_point() +
geom_label(aes(label = obs), nudge_y = 2) +
scale_x_continuous(breaks = seq(0, 25, 5)) +
scale_y_continuous(breaks = seq(0, 25, 5)) +
labs(title = "Our Predictor Space", x = expression("x"[1]), y = expression("x"[2]))</code></pre>
<p><img src="/post/20180902exploringleverageinlinearregression_files/figurehtml/unnamedchunk11.png" width="672" /></p>
<p>Notice that observation 16 above is not an outlier in the <span class="math inline">\(x_1\)</span> space by itself or in the <span class="math inline">\(x_2\)</span> space by itself. However, it is a multivariate outlier in the predictor space of <span class="math inline">\(x_1\)</span> and <span class="math inline">\(x_2\)</span> together since it’s out of the “oval” region that those 2 predictors have together which is represented below with an 80% confidence level ellipse for demonstrative purposes:</p>
<pre class="r"><code>ggplot(df, aes(x1, x2)) + geom_point() +
geom_label(aes(label = obs), nudge_y = 3) +
scale_x_continuous(breaks = seq(0, 25, 5)) +
scale_y_continuous(breaks = seq(0, 30, 5)) +
stat_ellipse(level = 0.80) +
labs(title = "Our Predictor Space", x = expression("x"[1]), y = expression("x"[2]))</code></pre>
<p><img src="/post/20180902exploringleverageinlinearregression_files/figurehtml/unnamedchunk21.png" width="672" /></p>
<p>Observation 17 above on the other hand is an outlier in both the <span class="math inline">\(x_1\)</span> and <span class="math inline">\(x_2\)</span> predictor spaces separately and also an outlier to the region of both predictors together since it’s far away from the center of the multivariate predictor space despite being true to the correlation relationship between the predictors.</p>
<p>Now let’s fit a linear model between a third variable <span class="math inline">\(y\)</span> as the outcome and our variables <span class="math inline">\(x_1\)</span> and <span class="math inline">\(x_2\)</span> as predictors. The model equation will therefore be: <span class="math display">\[\hat{y} = b_0 + b_1x_1 + b_2x_2\]</span></p>
<p>Below is a bar plot showing the leverage of the observations in the model:</p>
<pre class="r"><code>model < lm(y ~ x1 + x2, data = df)
lev_obs < tibble(leverage = hatvalues(model), obs = 1:17)
ggplot(lev_obs, aes(obs, leverage)) + geom_col() +
scale_x_continuous(breaks = seq(1, 17)) +
labs(title = "Observation Leverage Values", x = "Observation", y = "Leverage")</code></pre>
<p><img src="/post/20180902exploringleverageinlinearregression_files/figurehtml/unnamedchunk31.png" width="672" /></p>
<p>Notice how the leverage metric (from the diagonal of the hat matrix) was able to successfully detect that both observations 16 and 17 were far from center of the multivariate oval region of the predictors.</p>
<p>Observations that are not outliers for the predictors individually but are outliers for the predictors when taken together (the oval) will still be detected by the leverage metric. Therefore, when we have a new observation that we want to predict (<span class="math inline">\(h_0\)</span>), we need to make sure that its leverage is less than the maximum leverage value (<span class="math inline">\(h_{max}\)</span>) used in the model to ensure it’s within the multivariate predictor region covered by the model and that no extrapolation will take place.</p>
<p>Tangentially, the Mahalanobis distance is a similar distance measure that we can use for the same purpose as it can also measure distances from the center of multivariate predictor regions where predictor correlations are also taken into account. It is also very easy to calculate for new observations and compare to the Mahalanobis distances of the observations used in a model.</p>
<p>Below are the Mahalanobis distances of our example above:</p>
<pre class="r"><code>df_temp < select(df, x1, x2)
means < colMeans(df_temp)
cov_matrix < cov(df_temp)
mahalanbis_distances < mahalanobis(df_temp, means, cov_matrix)
mh < tibble(obs = 1:17, maha = mahalanbis_distances)
ggplot(mh, aes(obs, maha)) + geom_col() +
scale_x_continuous(breaks = seq(1, 17)) +
labs(title = "Observation Mahalanobis Distance Values",
x = "Observation", y = "Mahalanobis Distance")</code></pre>
<p><img src="/post/20180902exploringleverageinlinearregression_files/figurehtml/unnamedchunk41.png" width="672" /></p>
<p>Notice that while the Mahalanobis distance values are different from the leverage values, the shape of the plot is identical which indicates that we are measuring the same phenomenon. Therefore, we should be able to compare either metric for a new observation to the maximum value of that metric in our data to determine if we are extrapolating or not. However, this can be misleading if our model is not free from outliers such as in our example above. In such cases, we may better assess it by determining how many multiples of the average metric value (leverage or Mahalanobis distance) this new observation is. If observations 16 and 17 had been excluded before building the model, then comparing it to the maximum metric value would be sensible.</p>
<p>In fact, leverage and the Mahalanobis distance are related through the equation: <span class="math display">\[ \text{Mahalanobis}_i=(N1)(H_{ii}\frac{1}{N})\]</span></p>
<p>where:</p>
<ul>
<li><span class="math inline">\(N\)</span> is the total number of observations used in the model</li>
<li><span class="math inline">\(H_{ii}\)</span> is the diagonal value of the hat matrix of the model providing the leverage value of observation <span class="math inline">\(i\)</span></li>
</ul>

Supply Source & Route Selection with Linear Programming
/post/supplysourceandrouteselectionwithlinearprogramming/
Sun, 24 Jun 2018 00:00:00 +0000
/post/supplysourceandrouteselectionwithlinearprogramming/
<script src="/rmarkdownlibs/kePrint/kePrint.js"></script>
<p>Businesses face various types of problems that require making optimal decisions in order to achieve a certain objective. One such problem is the transportation / assignment problem I came across when I was reading the excellent “<a href="https://www.amazon.com/SpreadsheetModelingDecisionAnalysisIntroduction/dp/130594741X/">Spreadsheet Modeling and Decision Analysis</a>” book by Cliff T. Ragsdale. I found this type of problem interesting because the concept can be generalized to many areas of the business.</p>
<p>In this post, I will discuss a transportation / assignment problem scenario where we need to select the optimal supply sources & routes that would meet as much of the demand as possible while minimizing the cost of shipping. I will use network modeling & linear programming with <a href="http://www.gurobi.com/">Gurobi</a> Python to model & solve this problem.</p>
<p>An electronics company has decided to build 3 factories for its flagship product. There are 4 cities being considered as locations for those factories — Boston, Miami, Seattle, and Los Angeles. A maximum of one factory can be built in a city and each factory option has a different production capacity.</p>
<p>Additionally, the company has identified 4 cities with high demand for its products — Minneapolis, Chicago, Denver, and Dallas. The shipping cost per unit differs depending on the origin & destination cities.</p>
<p>The production capacities, demand levels, & shipping costs are in the table below:</p>
<table>
<thead>
<tr>
<th style="borderbottom:hidden" colspan="1">
</th>
<th style="borderbottom:hidden; paddingbottom:0; paddingleft:3px;paddingright:3px;textalign: center; " colspan="4">
<div style="borderbottom: 1px solid #ddd; paddingbottom: 5px;">
Supply City Options
</div>
</th>
<th style="borderbottom:hidden" colspan="1">
</th>
</tr>
<tr>
<th style="textalign:left;">
Demand Cities
</th>
<th style="textalign:right;">
Boston
</th>
<th style="textalign:right;">
Miami
</th>
<th style="textalign:right;">
Seattle
</th>
<th style="textalign:right;">
Los Angeles
</th>
<th style="textalign:right;">
Demand
</th>
</tr>
</thead>
<tbody>
<tr>
<td style="textalign:left;">
Minneapolis
</td>
<td style="textalign:right;">
$20
</td>
<td style="textalign:right;">
$24
</td>
<td style="textalign:right;">
$22
</td>
<td style="textalign:right;">
$26
</td>
<td style="textalign:right;">
4500
</td>
</tr>
<tr>
<td style="textalign:left;">
Chicago
</td>
<td style="textalign:right;">
$18
</td>
<td style="textalign:right;">
$20
</td>
<td style="textalign:right;">
$23
</td>
<td style="textalign:right;">
$24
</td>
<td style="textalign:right;">
7000
</td>
</tr>
<tr>
<td style="textalign:left;">
Denver
</td>
<td style="textalign:right;">
$24
</td>
<td style="textalign:right;">
$25
</td>
<td style="textalign:right;">
$16
</td>
<td style="textalign:right;">
$15
</td>
<td style="textalign:right;">
4500
</td>
</tr>
<tr>
<td style="textalign:left;">
Dallas
</td>
<td style="textalign:right;">
$23
</td>
<td style="textalign:right;">
$17
</td>
<td style="textalign:right;">
$23
</td>
<td style="textalign:right;">
$17
</td>
<td style="textalign:right;">
5500
</td>
</tr>
<tr>
<td style="textalign:left;fontweight: bold;">
Supply
</td>
<td style="textalign:right;fontweight: bold;">
3500
</td>
<td style="textalign:right;fontweight: bold;">
5000
</td>
<td style="textalign:right;fontweight: bold;">
4000
</td>
<td style="textalign:right;fontweight: bold;">
6000
</td>
<td style="textalign:right;fontweight: bold;">
</td>
</tr>
</tbody>
</table>
<p>Notice that the total demand of the demand cities exceeds the total supply of all 4 potential factories of which we can only select 3.</p>
<p>We need to decide which 3 cities should be selected for the factories as well as which routes should be used from each of them (& the number of units to ship) in order to minimize the cost of shipping while meeting as much of the demand as possible.</p>
<p>The information in the above table can be visualized as a network plot. We’ll use <a href="https://datastormopen.github.io/visNetwork/">visNetwork</a>, which is a great R package for interactive network visualization.</p>
<p>The production capacity values of the supply nodes are represented by negative values while the demand values of the demand cities are represented by positive values.</p>
<div class="figure">
<img src="/post/20180619sourcerouteselectionwithlinearprogramming_files/network_plot1.png" />
</div>
<p>Note that we’re not just trying to distribute our supply at the cheapest cost possible but rather trying to meet as much of the demand as possible with the least shipping costs.</p>
<p>If we try to solve this problem directly and minimize shipping costs, the model might prefer factories with smaller production capacities since distributing their supply may result in lower shipping costs. However, that would meet less of the demand, and our objective will not be achieved. To meet as much of the demand as possible at the lowest possible cost, we need to introduce an artificial supply node with an arbitrarily large supply quantity and large shipping cost per unit for all its possible routes. This addition makes our total supply exceed the total demand and would therefore force the model to fill as much of the demand as possible and would use the supply from the artificial node when it absolutely needs it (since it has very high costs). We can then ignore the parts of the solution related to the artificial supply node and its routes but use the remaining parts of the solution.</p>
<p>The new network model is shown below:</p>
<div class="figure">
<img src="/post/20180619sourcerouteselectionwithlinearprogramming_files/network_plot2.png" />
</div>
<div id="mathematicalformulationmodelbuilding" class="section level2">
<h2>Mathematical Formulation & Model Building</h2>
<p>Let’s formulate the model mathematically and build the model with Gurobi Python.</p>
<div id="decisionvariables" class="section level3">
<h3>Decision Variables</h3>
<p>There are two groups of decision variables needed here. First, a group of binary decision variables, one for each of the supply cities that would take a value of 1 if that city is selected and 0 otherwise. Second, a group of integer decision variables that represent the number of units to send for each possible route. If that value is 0, then the route will not be used. These can be represented mathematically as:</p>
<p>Binary variables that equal 1 if if node <span class="math inline">\(i\)</span> is selected as a supply node:<br />
<span class="math inline">\(y_i ~\forall ~i \in \{1,2,3,4\}\)</span></p>
<p>The number of units to send from supply node <span class="math inline">\(i\)</span> to demand node <span class="math inline">\(j\)</span>:<br />
<span class="math inline">\(x_{ij} ~\forall ~i \in \{1,2,3,4,9\}, ~j \in \{5,6,7,8\}\)</span>:</p>
<p>Let’s create those decision variables in our Gurobi Python model:</p>
<pre class="python"><code>from gurobi import *
m = Model("M1") # Setting up the model
# Creating the decision variables
y1 = m.addVar(name = "y1", vtype=GRB.BINARY) # Will node 1 be used for supply? 1/0
y2 = m.addVar(name = "y2", vtype=GRB.BINARY) # Will node 2 be used for supply? 1/0
y3 = m.addVar(name = "y3", vtype=GRB.BINARY) # Will node 3 be used for supply? 1/0
y4 = m.addVar(name = "y4", vtype=GRB.BINARY) # Will node 4 be used for supply? 1/0
x15 = m.addVar(name = "x15", vtype=GRB.INTEGER) # Num. units sent from node 1 to node 5
x16 = m.addVar(name = "x16", vtype=GRB.INTEGER) # Num. units sent from node 1 to node 6
x17 = m.addVar(name = "x17", vtype=GRB.INTEGER) # Num. units sent from node 1 to node 7
x18 = m.addVar(name = "x18", vtype=GRB.INTEGER) # Num. units sent from node 1 to node 8
x25 = m.addVar(name = "x25", vtype=GRB.INTEGER) # Num. units sent from node 2 to node 5
x26 = m.addVar(name = "x26", vtype=GRB.INTEGER) # Num. units sent from node 2 to node 6
x27 = m.addVar(name = "x27", vtype=GRB.INTEGER) # Num. units sent from node 2 to node 7
x28 = m.addVar(name = "x28", vtype=GRB.INTEGER) # Num. units sent from node 2 to node 8
x35 = m.addVar(name = "x35", vtype=GRB.INTEGER) # Num. units sent from node 3 to node 5
x36 = m.addVar(name = "x36", vtype=GRB.INTEGER) # Num. units sent from node 3 to node 6
x37 = m.addVar(name = "x37", vtype=GRB.INTEGER) # Num. units sent from node 3 to node 7
x38 = m.addVar(name = "x38", vtype=GRB.INTEGER) # Num. units sent from node 3 to node 8
x45 = m.addVar(name = "x45", vtype=GRB.INTEGER) # Num. units sent from node 4 to node 5
x46 = m.addVar(name = "x46", vtype=GRB.INTEGER) # Num. units sent from node 4 to node 6
x47 = m.addVar(name = "x47", vtype=GRB.INTEGER) # Num. units sent from node 4 to node 7
x48 = m.addVar(name = "x48", vtype=GRB.INTEGER) # Num. units sent from node 4 to node 8
x95 = m.addVar(name = "x95", vtype=GRB.INTEGER) # Num. units sent from node 9 to node 5
x96 = m.addVar(name = "x96", vtype=GRB.INTEGER) # Num. units sent from node 9 to node 6
x97 = m.addVar(name = "x97", vtype=GRB.INTEGER) # Num. units sent from node 9 to node 7
x98 = m.addVar(name = "x98", vtype=GRB.INTEGER) # Num. units sent from node 9 to node 8
m.update() # Updating the model </code></pre>
</div>
<div id="objectivefunction" class="section level3">
<h3>Objective Function</h3>
<p>We need to calculate the product of the cost of shipping per unit for each route by the number of units to be sent on that route and mimize the sum of those values.</p>
<p>Therefore, the objective function can be expressed mathematically as:</p>
<p>Minimize: <span class="math inline">\(z = 20x_{15} + 18x_{16} + 24x_{17} + 23x_{18} +\)</span><br />
<span class="math inline">\(~~~~~~~~~~~~~~~~~~~~~~24x_{25} + 20x_{26} + 25x_{27} + 17x_{28} +\)</span><br />
<span class="math inline">\(~~~~~~~~~~~~~~~~~~~~~~22x_{35} + 23x_{36} + 16x_{37} + 23x_{38} +\)</span><br />
<span class="math inline">\(~~~~~~~~~~~~~~~~~~~~~~26x_{45} + 24x_{46} + 15x_{47} + 17x_{48} +\)</span><br />
<span class="math inline">\(~~~~~~~~~~~~~~~~~~~~~~9999x_{95} + 9999x_{96} + 9999x_{97} + 9999x_{98}\)</span></p>
<p>Let’s add this objective function to our Gurobi Python model:</p>
<pre class="python"><code># Setting up the objective function
m.setObjective(20*x15 + 18*x16 + 24*x17 + 23*x18 + 24*x25 + 20*x26 + 25*x27 + 17*x28 + 22*x35 + 23*x36 + 16*x37 + 23*x38 + 26*x45 + 24*x46 + 15*x47 + 17*x48 + 9999*x95 + 9999*x96 + 9999*x97 + 9999*x98, GRB.MINIMIZE)
m.update() # Updating the model </code></pre>
</div>
<div id="constraints" class="section level3">
<h3>Constraints</h3>
<div id="balanceofflowconstraints" class="section level4">
<h4>Balance of Flow Constraints</h4>
<p>This constraints group ensures a balance of flow in our network. Since our total supply now exceeds the total demand, we need to place a constraint for each node such that its Inflow  Outflow <span class="math inline">\(\ge\)</span> its Supply or Demand.</p>
<p>In our case, all the supply nodes have only outflow and all of our demand nodes have only inflow.</p>
<p><span class="math inline">\(x_{15}  x_{16}  x_{17}  x_{18} \ge 3500\)</span> <span class="math inline">\(~~~~~~~~~~~~\)</span>} Node 1 <span class="math inline">\(x_{25}  x_{26}  x_{27}  x_{28} \ge 5000\)</span> <span class="math inline">\(~~~~~~~~~~~~\)</span>} Node 2 <span class="math inline">\(x_{35}  x_{36}  x_{37}  x_{38} \ge 4000\)</span> <span class="math inline">\(~~~~~~~~~~~~\)</span>} Node 3 <span class="math inline">\(x_{45}  x_{46}  x_{47}  x_{48} \ge 6000\)</span> <span class="math inline">\(~~~~~~~~~~~~\)</span>} Node 4</p>
<p><span class="math inline">\(x_{15} + x_{25} + x_{35} + x_{45} + x_{95} \ge 4500\)</span> <span class="math inline">\(~~~~~~~~\)</span>} Node 5 <span class="math inline">\(x_{16} + x_{26} + x_{36} + x_{46} + x_{96} \ge 7000\)</span> <span class="math inline">\(~~~~~~~~\)</span>} Node 6 <span class="math inline">\(x_{17} + x_{27} + x_{37} + x_{47} + x_{97} \ge 4500\)</span> <span class="math inline">\(~~~~~~~~\)</span>} Node 7 <span class="math inline">\(x_{18} + x_{28} + x_{38} + x_{48} + x_{98} \ge 5500\)</span> <span class="math inline">\(~~~~~~~~\)</span>} Node 8</p>
<p><span class="math inline">\(x_{95}  x_{96}  x_{97}  x_{98} \ge 99999\)</span> <span class="math inline">\(~~~~~~~~~~\)</span>} Node 9 (Artificial Supply Node)</p>
</div>
<div id="selectionofsupplynodesconstraints" class="section level4">
<h4>Selection of Supply Nodes Constraints</h4>
<p>Since we need to select 3 supply nodes out of 4, we need constraints on those nodes to ensure that each of their respective binary decision variables gets set to 1 if any of its supplies are sent on any routes to the demand nodes. This is done by using the equation Outflow + Supply*Binary Var <span class="math inline">\(\le0\)</span> for each supply node.</p>
<p><span class="math inline">\(x_{15} + x_{16} + x_{17} + x_{18} 3500y_1 \le 0\)</span> <span class="math inline">\(~~~~~~~~\)</span>} Node 1 Binary <span class="math inline">\(x_{25} + x_{26} + x_{27} + x_{28} 5000y_2 \le 0\)</span> <span class="math inline">\(~~~~~~~~\)</span>} Node 2 Binary <span class="math inline">\(x_{35} + x_{36} + x_{37} + x_{38} 4000y_3 \le 0\)</span> <span class="math inline">\(~~~~~~~~\)</span>} Node 3 Binary <span class="math inline">\(x_{45} + x_{46} + x_{47} + x_{48} 6000y_4 \le 0\)</span> <span class="math inline">\(~~~~~~~~\)</span>} Node 4 Binary</p>
<p>Looking at the above inequalities, note that since supply is a negative number, this forces the binary variable for a supply node to become 1 if any of that node’s routes is set to carry any units.</p>
<p>Next, to ensure that 3 supply nodes only are selected we need to add the following constraint:</p>
<p><span class="math inline">\(y_1 + y_2 + y_3 + y_4 = 3\)</span></p>
</div>
<div id="nonnegativityintegralitybinaryconstraints" class="section level4">
<h4>Nonnegativity, Integrality, & Binary Constraints</h4>
<p>These constraints ensure that the decision variables can only take on values that make sense in our model.</p>
<p><span class="math inline">\(y_i \in \{0,1\} ~\forall ~i \in \{1,2,3,4\}\)</span> <span class="math inline">\(~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\)</span>} Binary Constraints <span class="math inline">\(x_{ij} \ge 0 ~\forall ~i \in \{1,2,3,4,9\}, ~j \in \{5,6,7,8\}\)</span> <span class="math inline">\(~~~~~~~~\)</span>} Nonnegativity Constraints <span class="math inline">\(x_{ij} \in \mathbb{Z} ~\forall ~i \in \{1,2,3,4,9\}, ~j \in \{5,6,7,8\}\)</span> <span class="math inline">\(~~~~~~~~\)</span>} Integrality Constraints</p>
<p>We have already set the nonnegativity, integrality, & binary constraints in our model while we were creating the decision variables. Now, we shall add the rest of the constraints to our model:</p>
<pre class="python"><code># Adding the constraints to the model
m.addConstr(x15  x16  x17  x18>=3500, "c1") # Constraint for node 1
m.addConstr(x25  x26  x27  x28>=5000, "c2") # Constraint for node 2
m.addConstr(x35  x36  x37  x38>=4000, "c3") # Constraint for node 3
m.addConstr(x45  x46  x47  x48>=6000, "c4") # Constraint for node 4
m.addConstr(x15 + x25 + x35 + x45 + x95>=4500, "c5") # Constraint for node 5
m.addConstr(x16 + x26 + x36 + x46 + x96>=7000, "c6") # Constraint for node 6
m.addConstr(x17 + x27 + x37 + x47 + x97>=4500, "c7") # Constraint for node 7
m.addConstr(x18 + x28 + x38 + x48 + x98>=5500, "c8") # Constraint for node 8
m.addConstr(x95  x96  x97  x98>=99999, "c9") # Constraint for node 9
# Setting the binary var. y1 to 1 if either x15, x16, x17, or x18 are >= 0 else it's 0
m.addConstr(x15 + x16 + x17 + x18 3500*y1<=0, "c10")
# Setting the binary var. y2 to 1 if either x25, x26, x27, or x28 are >= 0 else it's 0
m.addConstr(x25 + x26 + x27 + x28 5000*y2<=0, "c11")
# Setting the binary var. y3 to 1 if either x35, x36, x37, or x38 are >= 0 else it's 0
m.addConstr(x35 + x36 + x37 + x38 4000*y3<=0, "c12")
# Setting the binary var. y4 to 1 if either x45, x46, x47, or x48 are >= 0 else it's 0
m.addConstr(x45 + x46 + x47 + x48 6000*y4<=0, "c13")
# Ensuring that 3 of the variables y1, y2, y3 and y4 are equal to 1
m.addConstr(y1+y2+y3+y4==3, "c14")
m.update() # Updating the model </code></pre>
</div>
</div>
</div>
<div id="results" class="section level2">
<h2>Results</h2>
<p>Finally, let’s run the optimization and find the optimal values of our decision variables:</p>
<pre class="python"><code>m.optimize() # Finding the optimal solution</code></pre>
<pre><code>## Optimize a model with 14 rows, 24 columns and 64 nonzeros
## Variable types: 0 continuous, 24 integer (4 binary)
## Coefficient statistics:
## Matrix range [1e+00, 6e+03]
## Objective range [2e+01, 1e+04]
## Bounds range [1e+00, 1e+00]
## RHS range [3e+00, 1e+05]
## Found heuristic solution: objective 2.149785e+08
## Presolve removed 5 rows and 0 columns
## Presolve time: 0.15s
## Presolved: 9 rows, 24 columns, 44 nonzeros
## Variable types: 0 continuous, 24 integer (4 binary)
##
## Root relaxation: objective 6.525850e+07, 15 iterations, 0.04 seconds
##
## Nodes  Current Node  Objective Bounds  Work
## Expl Unexpl  Obj Depth IntInf  Incumbent BestBd Gap  It/Node Time
##
## * 0 0 0 6.525850e+07 6.5258e+07 0.00%  0s
##
## Explored 0 nodes (15 simplex iterations) in 0.36 seconds
## Thread count was 8 (of 8 available processors)
##
## Solution count 2: 6.52585e+07 2.14978e+08
##
## Optimal solution found (tolerance 1.00e04)
## Best objective 6.525850000000e+07, best bound 6.525850000000e+07, gap 0.0000%</code></pre>
<pre class="python"><code>m.printAttr("X") # Printing the decision variable values</code></pre>
<pre><code>##
## Variable X
## 
## y2 1
## y3 1
## y4 1
## x26 5000
## x37 4000
## x47 500
## x48 5500
## x95 4500
## x96 2000</code></pre>
<p>The above solution shows that Miami, Seattle, and Los Angeles (nodes, 2,3,4) have been selected as the 3 supply nodes. We can also see that some routes from those supply nodes were assigned values while others were not. The artificial supply node was also used to fulfill some demand — routes <span class="math inline">\(x_{95}\)</span> and <span class="math inline">\(x_{96}\)</span>. We should ignore the values related to the artificial node as well as the objective function value in the output since the model only used them to be able to fulfill the demand we could not cover with our real supply. Additionally, the results above show that we could not cover Minneapolis as no routes from Miami, Seattle, or Los Angeles to Minneapolis were assigned positive values. It was simply not economical to fulfill Minneapolis’s demand given that the demand of cities closer to the supply nodes would already exhaust the entire supply of the supply nodes. Chicago’s demand was not even fully met.</p>
<p>The above decision variable values allow us to meet as much of the demand as possible (15000 units) at the lowest cost of shipping: <span class="math inline">\(20*5000 + 16*4000 + 15*500 + 17*5500=\)</span> $265,000.</p>
<p>The network given by the solution can be seen in the network plot below:</p>
<div class="figure">
<img src="/post/20180619sourcerouteselectionwithlinearprogramming_files/network_plot3.png" />
</div>
<p>Extensions to the techniques shown here can be used to solve other types of problems including transhipment problems (where some nodes can both send and receive units from other nodes) and shortest path problems (where the objective is to find the shortest path from an origin node to a destination node with multiple nodes and route options between them). Additionally, nonlinear programming can be used when the objective functions and/or constraints are nonlinear.</p>
<p>Have you come across similar problems in your line of work where using these techniques is needed to reach the optimal solution?</p>
<p>Please share your thoughts in the comments.</p>
</div>

Word Predictor
/project/wordpredictor/
Tue, 27 Mar 2018 00:00:00 +0000
/project/wordpredictor/

Predicting Exercise Manner
/project/predictingexercisemanner/
Wed, 27 Apr 2016 00:00:00 +0000
/project/predictingexercisemanner/