Introduction to Kelly Criterion
Kelly’s formula is a theoretical benchmark for deciding the appropriate position size when investing, trading or gambling. A divergence in attitude towards this theory illustrates the disconnect between academicians and practitioners, and the necessity of closer collaboration between the two circles.
To understand the essence of Kelly’s formula, let us consider the question: Can one lose money in a game in which one has a favorable probability of winning? The answer is, absolutely yes. To see why, think of the simple game of tossing a biased coin: heads means that the player wins the bet, and tail that he loses. Even when the player has a 90% winning probability, if he bets all he has every time, then sooner or later he will inevitably encounter one losing game and go bankrupt. This is a simplified example of gambler’s ruin. Yet, since the odds are so much in favor of the player, it is unreasonable not to play. The reasonable middle ground between not playing and playing is to come up with an optimal bet size. Kelly’s formula determines such an optimal size.
Sketching the log return per play in such a game as a function of the bet size (as a percentage f of the total wealth of the player), we get the picture in the figure below.
The log return function is essentially what Kelly employed to solve for the unique optimal betting percentage. In our example we can observe the theoretical optimal bet size to be 80%. In general, in such a coin-toss game, if the probability of winning and losing are p and q=1-p, respectively, then Kelly’s formula tells us the theoretical optimal betting size should be
More interesting is that Kelly was motivated to explain Shannon’s information rate in information theory. As a result, the average optimal log return at the Kelly’s bet size is actually proportional to the information rate, and can be viewed as the information in this game favorable to the player. The subsequent development of this important result within the circles of practitioners and academicians is rather interesting.
Practitioners regard the Kelly bet size as a red line that should never be crossed. In fact, experienced traders and investors have long known the importance of being conservative in allocating capital into risky assets, even without knowing the Kelly’s formula. They have learned from trial and error — although many seem to have forgotten this principle during the past two decades.
Coming back to our coin toss game, if we used the bet size of 80%, then (with probability 0.1) in one losing game you could loss 80% of your total wealth. If you are unlucky and encounter two losing games consecutively, the total loss will be 96%. Gamblers with a little common sense don’t need any formula to know this is too risky. Thus, for practitioners, Kelly’s formula provides a useful guide for the upper bound of allocating capital to the risky assets. The emphasis has always been how to reduce the risk from there.
Edward Thorp, a mathematics professor turned legendary blackjack player and the pioneer of the basic system for playing blackjack, was a leading practitioner of the Kelly’s formula. He first applied Kelly’s formula in managing bet size in blackjack and later generalized the principle to money management in trading. Thorp’s view is representative of that of a practitioner. Recently, answering a question on whether he uses the Kelly’s formula in asset allocation, he replied,”.. if you bet half the Kelly amount, you get about three-quarters of the return with half the volatility. So it is much more comfortable to trade. I believe that betting half Kelly is psychologically much better…”
Indeed, recent research of Vince and Zhu shows that when incorporating the practical consideration of adjusting for risk, and realizing that one only gambles over a finite horizon of time, Kelly’s suggested bet size needs to be adjusted down considerably. This confirms what practitioners have long been doing in the real world.
For many decades, Kelly’s formula was dismissed as irrelevant by leading academics. These included Nobel Prize Laureate Paul Samuelson, who went as far as calling the Kelly criterion a fallacy, and derided it in an article consisting entirely of one-syllable words (presumably so that his benighted opponents could understand it). This led to the bitter Samuelson controversy, which positioned Markowitz’s portfolio theory against the Growth Optimal Portfolio (GOP), each side claiming superiority over the other.
This controversy seems fueled more by partisanship (economists vs. mathematicians) than by practical considerations. Some of the most successful investors have reportedly applied the Kelly criterion, including Warren Buffett and Bill Gross. The caveat is that to construct and to manage the growth portfolio (or alternatively the Markowitz portfolio), one needs to know the joint probability distribution of the price processes of all the assets involved. This is of course a tall order. In practice, all one has to work with is the price history of the assets involved, which is used to estimate the true joint probability distribution. However, searching for optimal strategies through historical simulations leads to the trap of backtest overfitting (see the paper Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-of-Sample Performance).
These two tales of Kelly’s formula illustrate the dangers of naively following theoretical financial results in investment practice. This is not to say that theoretical results are not useful. For example, the Sharpe ratio, which was developed in the context of Markowitz’s portfolio theory, has become an industry standard in measuring the (past) performance of mutual funds and hedge funds. Similar usage of the idea of the growth portfolio is also useful, especially given the close relationship between the Kelly formula and information theory. As always, caveat emptor!
Analysis of Multiple Simultaneous Non-Independent Investment Opportunities with Multiple Possible Outcomes
There are a lot of papers that have been published about figuring out how much money to invest in a business opportunity given your level of wealth, its odds of success, and its estimated returns in the events of success and failure. Usually the further assumption is made that your decision to invest or not doesn’t affect the price of the issue or the odds of success.
Frequently I’ve seen papers analyzing two or more such opportunities in terms of finding an optimal strategy for investing in both, and these are good work too as far as they go; they’re accurate in the way that physicists models that involve frictionless surfaces and model planetary masses as a point are accurate; within the limitations of their simplifying assumptions.
The problem is, that’s not how business opportunities look in the real world.
In this paper I’m going to review the math that governs the simple cases and then move on to introduce the math that governs the more complicated cases.
Everybody knows, or everybody should know, the Kelly Criterion for making optimal investments in a single two-valued opportunity. It’s very simple; you invest your wealth, times the edge, divided by the variance of the bet.
“The edge”, in this case, is the return you’d get per dollar invested in the average case.
So let’s say I offer you a simple coin toss where I offer to double the money you bet on it if you win the toss. Since you win or lose exactly the amount bet, the edge is zero. The Kelly Criterion says you don’t take this bet because there is no long-term edge to justify the risk of losing your money.
So I’ll get a little stupid in order to make a point, and make it sweeter. Let’s say I triple your money if you win and take your money if you lose. Now you’ll win one bet of every two and lose one bet of every two, winning twice the amount bet and losing the amount bet. So, you come out ahead by the amount bet every two bets, and the edge is 1/2. The variance though is bigger; if you can be up two or down one times the amount bet, the difference between them, or the variance, is three. Since 1/2 divided by 3 is 1/6, the Kelly Criterion says you bet one-sixth of your money on this bet.
Let’s make it even sweeter. Now I offer to quadruple your money if you win (I must be really amazingly stupid to make you such an offer) and take your money if you lose. Now your edge is 3/2 since, on average, you get back one and a half times the amount bet. But the variance is 5. Since 3/2 divided by five is 3/10, the Kelly Criterion says you bet three-tenths of your money on this bet.
Now let’s get ridiculous. Let’s say I offer you a hundred times your money back if you win a coin toss, and take your money if you lose. Now how much should you risk? Well, your edge is now 99/2 and the variance is 100. So you wind up betting 99/200 of your money on this bet.
By now you’ve probably noticed the point I’m heading for: No matter how big your edge gets, the Kelly Criterion says you never *EVER* invest a fraction of your wealth greater than the probability of losing it. Even if you are fifty percent likely to win a billion dollars for every dollar bet, you don’t bet more than half your money. State lotteries that cost a dollar and have chances of winning of one in ten million are bad investments for anyone who has less than ten million dollars, no matter how many billions large the jackpot may be.
It turns out that the Kelly Criterion tells you EXACTLY how much of your wealth you should invest in order to maximize long-term growth of your money. If you bet more, you have more risk but don’t make as much money. If you bet less, you have less risk and don’t make as much money. Now, if you need to actually take income out of your investing wealth every so often, then you should be investing less than the Kelly Criterion says; the money you take out isn’t contributing to longer term growth, so it doesn’t justify as much risk as the Kelly Criterion accepts. But there is never, under any circumstances, any reason to bet more than the Kelly Criterion suggests; in terms of money management, the Kelly Criterion is the bright clear line between aggressive long-term investing that undertakes exactly as much risk as necessary to absolutely maximize growth, and insane investing that accepts more risk than needed and by doing so impairs the long-term growth of funds.
All this is very simple when there are just two possible outcomes. More gamblers than investors know the Kelly Criterion, because they’re more familiar with the simplified, two-valued kind of investment opportunities where it’s easy to calculate.
But real business opportunities don’t look like that.
How do you calculate the Kelly Criterion when you’re looking at Acme Widgets and the government is seeking bids on a big widget contract, and you figure they have about
- a 15% chance of landing the contract and making a 50% return,
- a 20% chance of being a supplier to the company that lands the contract and making a 30% return,
- a 55% chance of having no contract awarded and doing business as usual making a 10% return, and
- a 10% chance of having a competitor get the contract and losing 70% of the money invested?
What’s the most you should put into this company? The math is a bit more complicated now, and there isn’t a straightforward way to find an answer. But there is a straightforward way (well, only mildly complicated) to check how good a possible answer is.
What the Kelly Criterion does is to maximize the logarithm of the expected wealth. By maximizing the logarithm repeatedly and compounding your earnings, maximum growth of wealth is achieved. So, while it’s no longer straightforward to directly calculate the Kelly threshold for this more complicated situation, you can still iteratively maximize the logarithm of expected wealth to find the optimal Kelly-Criterion investment. Here’s an example.
Let’s say you have a million dollars to manage.
The natural logarithm of 1000000 is 13.8155, so that’s the benchmark for making no investment at all.
Now, if you contemplate putting all of your money into acme widgets, then you have to figure the different outcomes and likelihoods and take the weighted average of their logarithms. So….
0.15 * ln(1000000 * 1.50) +
0.20 * ln(1000000 * 1.30) +
0.55 * ln(1000000 * 1.10) +
0.10 * ln(1000000 * 0.30) = 13.8608.
Since making no investment at all gave a logarithm of 13.8155, investing all your money in Acme Widgets is seen as being better than investing none of it.
But is that all there is to the story? What if you only invest half your money?
0.15 * ln (500000 + 500000 * 1.50) +
0.20 * ln (500000 + 500000 * 1.30) +
0.55 * ln (500000 + 500000 * 1.10) +
0.10 * ln (500000 + 500000 * 0.30) = 13.8607.
This is very close to being as good as investing all your money. Let’s try three-quarters:
0.15 * ln (250000 + 750000 * 1.50) +
0.20 * ln (250000 + 750000 * 1.30) +
0.55 * ln (250000 + 750000 * 1.10) +
0.10 * ln (250000 + 750000 * 0.30) = 13.8692.
That’s better than either all or half, so let’s see what happens if we invest seven-eighths of our wealth, which is halfway between the two best scores we’ve seen so far:
0.15 * ln (125000 + 875000 * 1.50) +
0.20 * ln (125000 + 875000 * 1.30) +
0.55 * ln (125000 + 875000 * 1.10) +
0.10 * ln (125000 + 875000 * 0.30) = 13.8679.
That’s not as good as investing three-quarters of our wealth, so it’s too much. We can back off a little bit and try investing 13/16 of our wealth:
0.15 * ln (187500 + 812500 * 1.50) +
0.20 * ln (187500 + 812500 * 1.30) +
0.55 * ln (187500 + 812500 * 1.10) +
0.10 * ln (187500 + 812500 * 0.30) = 13.8691.
And this is the best score we’ve seen so far. The optimal amount to invest is going to be right around here somewhere; we could carry this regression out ten more steps and have it correct to within one part in about 20000. But we don’t have to; I’ve just shown the first few steps to illustrate the process.
The problem is real business opportunities don’t look like that either.
In the real world, you’re never looking at a situation where you’re deciding how much money to put into your only investment opportunity. At the very least, the money you don’t invest in that opportunity may usefully be placed in a risk-free investment like gold or a low-risk investment like Treasury Bills. You should also be considering Acme Widgets’ competitor, the Klein Brush & Bottle Company, because if Acme doesn’t get that contract, Klein is a whole lot more likely than otherwise to get it. You know they won’t both get the contract, although they may both be suppliers if someone else gets it. And you also know that if you invest in both companies, you run less risk because the downside risk at Acme is coincident with a much higher probability of a high return at Klein and vice versa. And finally, since Acme is a fairly small company and you’re looking at a medium-sized fund, the amount you invest may drive up the price you have to pay for its stock, which will drive down your effective return.
But, the technique above turns out to be something you can generalize:
The General Form of the Kelly Criterion is:
Sum for all X of (probability of X * ln (ending wealth if X happens))
This is how you can calculate the degree to which your growth opportunities are being maximized. Now, if you’ve done a lot of math, you’re already looking at the generalized form of the Kelly Criterion, above, and setting up integrals in your head to deal with continuous probability distribution functions and reward levels and differentials to help find the optimum points, but it turns out that depending on what the probability and return formulas are like, it may not be generally or easily integrable or differentiable. In fact it’s usually not.
This is a most excellent formula, because you can use it to evaluate investment strategies involving making lots of different investments simultaneously: For example, you might try different investment levels in Acme and Klein and Gold, and account for such things as the difference in the tax bite that depend on how well you do.
But this complicates things, because if we substitute in complicated formulas that depend on our investment for the probability of X, and we substitute in complicated formulas that depend on our investment for the rate of return if X happens, we wind up with nonmonotonic functions in multiple variables.
And with nonmonotonic functions in multiple variables, you can’t easily just “home in on it” the way I did above, because the function may have several local maxima, local minima, and discontinuities.
Optimization as Search and Simplifying Assumptions
In this case, optimization becomes a search, and the more complex the set of outcomes you’re looking at and the greater the number of investments you’re trying to optimize the distribution of money between, the harder the search becomes. Here is where you make a lot of simplifying assumptions, aggregating companies into industries and risk profiles to try to reduce the number of variables you have to work with. Here is where you assume things operate independently, even though sometimes they may not, because analysis of independent variables can be carried out separately from each other.
But very complex search spaces are in fact, what genetic algorithms, stochastic searches, and multivariate regressions are for, and computer code can be your tool to cut through a whole lot of fog here seeking the best investment levels in all these different opportunities.
Usually you have to pick and choose which simplifying assumptions you’re making and which you’re throwing out. Analyzing the situation of Acme and Klein and that big contract, clearly you shouldn’t assume that they’re independent. But analyzing, say, an oil-rig firefighting company and a boot and shoe dealer, you can be pretty comfortable assuming that their relative performances have nothing to do with each other. It may turn out that the Boot and Shoe Company makes a lot of its money manufacturing protective boots for the rig firefighters so it may not be true — but it’s a pretty comfortable assumption, and I’d make it in a heartbeat. There’s really no way you can capture every detail of every possible interdependence; you just have to wind up ignoring some of them.
The Value of Conservative Assumptions
Remember what I said earlier about the Kelly Criterion being the clear bright line between aggressive investing and insane investing?
If you overestimate the amount you should invest, you expose yourself to more risk, and simultaneously reduce the long-term growth of your wealth. That is insane. If you underestimate the amount you should invest, you make less money, which is bad, but you also expose yourself to less risk and adjust for eventually taking some income from your wealth, which is good. Investing more than the Kelly Criterion says is clearly insane, but there are good reasons why most people should want to invest less.
That is why conservative assumptions — those which would lead you to invest less — are generally better than assumptions which would lead you to invest more. It’s clear that accurate assumptions are the best assumptions of all, but when you are forced to deviate from accuracy, it’s best to deviate in a conservative direction.
The Art of Picking Your Assumptions
And this is where hard math meets art, science, and experience. We have a tool we can use to analyze any investment strategy, under a given set of assumptions. But we have to make assumptions which we know aren’t completely true all the time to control the complexity of the analysis, and the assumptions we make and don’t make govern the accuracy of our analysis. And this, traditionally, is the part of securities analysis you can’t automate; there’s no way an automaton can adjust for things it just plain doesn’t know.
Closing The Loop
Or is there? We have been talking about optimizing systems based on predictions; and we’re already used to the idea of optimizing ex ante prediction systems based on ex post performance. What if we make a dozen different systems that use a dozen different sets of assumptions, and turn them all loose trying to figure out optimal investment strategies in hundreds or thousands of ex ante scenarios drawn from real life? Then we could come back with the ex post performance numbers from those scenarios and figure out how well each of the robot portfolio managers would have done.
Clearly, the robot whose portfolio did the best must, ipso facto, have been the one whose predictions were most useful on the scenarios presented.
Now, what if we do it iteratively, performing a cluster analysis on the ex-ante information and ex-post performance pairs to find out which set of assumptions tends to do best in what clusters? As a cluster analysis on regression criteria, it’s going to be an expensive computation; it could tie up a good workstation for several weeks. But the results would continue to be useful for years.
But anyway, that is a topic for a different paper. In this paper I have introduced the Kelly Criterion itself and how to apply it in complicated situations. This creates a need to make simplifying assumptions, but they don’t have to be the same assumptions that the authors of so many other papers have made. My point is that you have to pick which simplifying assumptions you make based on the business situations you’re presented with, and you can frequently do better in some situations by using different sets of assumptions more appropriate to those situations.
But picking the assumptions is not, as usually presented, a problem that completely defies analysis either. There is a possibility of “closing the loop” and creating a system that picks and chooses its assumptions without human input, based on the business situations presented.
-By Ray Dillinger