Weighing options with python and petersburg

In the past few weeks I've posted a number of examples of applications for petersburg.  You can check them out here, here and here.  It is, in short, an extension of probabilistic decision graphs and Bayesian networks that allows for some interesting analysis on decision theoretic problems.

The interesting thing that arises out of this kind of analysis is the importance on the cost of exploration. When the outcomes of a choice are uncertain, and the costs of finding the true outcomes (either by trying out the choice explicitly, or by investing more time and resources into analysis and research), are non-zero, the real world solutions often hinge on the relative costs exploration. If one choice has a satisfactory payoff, and can be cheaply investigated, it is often prioritized. But the interplay between payoff and cost to explore is often over-simplified, due to a lack of respect for the real non-linearity presented in scoring and risk.

Let's take for example the path of a startup in a technical and nascent market. A choice must be made as to how to develop the product (recall the previous discussion of difficult to scope in house development versus 3rd party tools and services). The choice comes down to 2 primary options:

  • Build the product in house
  • Attempt to use 3rd party tools and services to accelerate development

Both paths have uncertainty, but they tend to have different kinds of uncertainties. When developing from scratch, the certainty gained in control and flexibility is paid for in labor, time, and cost. Conversely, the faster iterations enabled by 3rd party tools and services are paid for if and when they don't work. The uncertainty here is the possibility of having to revert to in-house. For each step of iteration, the 3rd party tools are faster and cheaper (for the sake of our example), but there is a non-zero risk of being forced to revert at an increasing cost (a mature product is more expensive to refactor than a prototype).

So we are faced with a binary problem: pursue the more certain quality of in house work at short term cost, or risk the 3rd party route to reap possible speed to market? It comes down to the likelihood of having to revert at each step, it would seem, but this number can't be reliably determined ahead of time, especially in a nascent technical space.

How do we reason with this? Well, first, let's build a graph:


Screen Shot 2016-02-03 at 8.04.41 PM

In this DAG, we have 4 main parameters to worry about. M is the cost to progress from one step to another along the in-house path. Once on that path, it is certain that the end will be reached (along B-C-D), but the cost is higher than along the E-F-G because: M>N. So if the outcome was certain for both trees, trivially it falls out that E-F-G is better than B-C-D. But at each of the nodes E, F, and G, there is some probability,  \rho_{switch} of being forced back onto the in-house track (B-C-D), at a cost: c. The cost to revert, c is an increasing function of the stage of the job. For the case where the project has to be completely restarted,  c(x)=M*x. For real world projects, c is usually less than that.

For most business decisions,  \rho_{switch} is not actually known. Within an order of magnitude, a good technical manager or architect can estimate M, N and have a ballpark for how c(x) might behave, but the probability of failure is much more difficult. Using a monte-carlo simulation though, we can simulate the expected outcomes at a set of different \rho_{switch} values.

First we will use the parameters M=1, N=0.8, c(x)=0.5<em data-recalc-dims=Mx" />, which means the 3rd party route is 20% cheaper, and if we have to revert, 50% of the work done is still usable. After simulation, we get this plot:


Screen Shot 2016-02-03 at 8.04.57 PM

In this plot, we have the \rho_{switch} on the x-axis, and the relative strength of the in-house option on the y-axis. This shows for any given set of simulations at a given  \rho_{switch}, whether the in-house option turned out to be stronger or weaker than the 3rd party option. For the first case, we learn that if the probability of having to revert is over about 30%, it is more prudent to stay in house. In the next two plots we try the same simulations with cheaper and more expensive models for reversion cost respectively.


Screen Shot 2016-02-03 at 8.05.08 PM

In figure the above plot, with almost none of the work from the 3rd party route salvageable upon reversion, we can see that \rho_{switch} at which the 3rd party route becomes profitable drops to under 20%.

Screen Shot 2016-02-03 at 8.05.16 PM

Finally in the figure above, where almost all of the work from the 3rd party route is salvaged, the in house option is never profitable.

These three examples further illustrate the unintuitive non-linearity of complex decisions under uncertainty. Most observers would see the 10% drop in the decision point between the first two cases and think that the final plot would break even somewhere around 40-50%: a proportional difference. But this isn't the case at all.


Screen Shot 2016-02-03 at 8.05.27 PM

This plot shows the breakeven probability for 50 discrete values of c_{switch} in the cost function: c(x) = c_{switch} * M * x. In the mid-60%s for percent of work salvaged, the breakeven probability goes to infinity, with no valid breakeven point for any higher values.

So what is the takeaway from this? In complex decision making, the outcomes are usually non-linear in very hard to grasp ways. In spite of this, if some parameters are known, we can infer something about the rest. With only knowledge of the relative costs between 3rd party and in-house options, and a ballpark idea of how much work could be salvaged if we had to switch, we can back into a required confidence.

If the ballpark amount of work saved on revert is 50%, and if the probability of failure in the 3rd party option (\rho_{switch}) is anything over 30%, in house is the better choice. Alternatively, if the ballpark amount of work saved on reversion is only 10%, in house is the better choice for all but the most certain of 3rd party cases, and if it's 90%, 3rd party is better no matter what. We can make informed decisions with incomplete information by understanding the structure of the problem, even if we can't know the exact parameters.


Check out the code and play with it yourself here:



Will has a background in Mechanical Engineering from Auburn, but mostly just writes software now. He was the first employee at Predikto, and is currently building out the premiere platform for predictive maintenance in heavy industry there as Chief Scientist. When not working on that, he is generally working on something related to python, data science or cycling.

One Comment

Leave a Reply