Performance Measurement Under Uncertainty
Every performance ratio in quantitative finance is a single number. The Sharpe ratio, the Sortino ratio, the Calmar ratio, the Omega ratio, the Rachev ratio: each takes a stream of returns and compresses it into one scalar. The question this post answers is what, precisely, that compression discards and what it preserves.
The answer requires measure theory. A return stream is a realization of a random variable on a probability space. A performance ratio is a functional on the space of return distributions. Different ratios correspond to different functionals, and the relationships between them (when they agree, when they diverge, what pathologies each one hides) become visible only when you write them in a common mathematical language. We construct that language here, then use it.
Section 0 establishes the epistemic status of the probability model itself. Section 1 builds the measure-theoretic machinery: the probability space, moments as Lebesgue integrals, partial moments, tail functionals, and the Johnson-Lindenstrauss dimensionality argument that motivates the full taxonomy. Section 2 defines the general performance functional and derives the Omega ratio as a canonical representative. Section 3 instantiates each classical ratio as a special case and proves four novel theorems relating them. Section 4 applies the framework to live backtest data. Section 5 closes with a geometric perspective on the space of performance functionals.
0. Philosophical Prior
The market is a deterministic system. A very large number of agents, each with private information, capital constraints, and objectives, submit, cancel, and execute orders according to deterministic decision rules. The resulting price at any moment is the output of this high-dimensional deterministic dynamical system, projected onto a one-dimensional observable: the mid-price.
The apparent randomness of returns is a consequence of this projection. When a portfolio manager computes the daily return of a strategy, the single number she records is the image of an enormously complex deterministic state under a function that discards almost all of the information in that state. The discarded information (the full order book, every agent's private signal, every latent intention to trade) is what makes the retained information look stochastic.
This observation has a precise mathematical structure. Let denote the full state of the market at time , where is enormous (on the order of the number of agents times the dimension of each agent's internal state). The return over one period is the output of a deterministic function applied to two successive states:
We do not observe . We observe only the sequence . Because is a radical dimensionality reduction (from to ), the sequence inherits statistical regularities that are well-modeled by a stochastic process, even though the underlying system is deterministic.
We therefore construct a probability space and model returns as random variables on it. This is a modeling choice, not an ontological claim. The probability measure encodes our ignorance of the full state, not any intrinsic randomness in the market.
Every performance ratio inherits this epistemic status. The Sharpe ratio of a strategy is a property of the model , not of the market. Two traders who use different probability models for the same strategy will compute different Sharpe ratios. Both are "correct" relative to their respective models. The interesting question is which model is closer to the data-generating process at the resolution of interest, and that question is empirical, not definitional.
The projection discards information. How much? The Johnson-Lindenstrauss lemma (Section 1.4) provides a rigorous lower bound on how much geometric structure survives dimensionality reduction. We state it precisely there. The qualitative takeaway, which motivates the entire taxonomy of ratios, is that compressing a distribution to a single scalar is far more aggressive than any random projection that preserves pairwise distances. A single ratio cannot capture the full shape of a return distribution. The question is which aspect of the shape each ratio captures, and that is what the rest of this post makes precise.
1. Measure-Theoretic Foundations
This section constructs the mathematical objects that every performance ratio is built from. The treatment is self-contained: the only prerequisite is familiarity with Lebesgue integration at the level of a first graduate course in real analysis.
1.1. The probability space
We work on a probability space , where is the sample space of all possible market histories, is a -algebra of events, and is a probability measure. The choice of is the modeling decision discussed in Section 0: it encodes our beliefs about which market histories are likely, given incomplete knowledge of the full state.
A return random variable is a measurable function , where is the Borel -algebra on . The cumulative distribution function (CDF) of is
and the survival function is
The CDF is right-continuous, non-decreasing, with and . It completely characterizes the distribution of under . The pushforward measure on is the law of , and integration against is equivalent to integration against via the change-of-variables formula:
for any measurable for which the integral exists. This identity is the bridge between the abstract probability space and the concrete distribution of returns. Every formula in this post ultimately reduces to an integral against .
1.2. Moments as Lebesgue integrals
The moments of are the building blocks of the simplest performance ratios. We define them as Lebesgue integrals to make the existence conditions explicit.
The -th raw moment of is
provided the integral exists (i.e., ). The first raw moment is the mean return. The -th central moment is
The familiar summary statistics of a return distribution are low-order central moments. The variance is , measuring dispersion around the mean. The standard deviation has the same units as . The skewness is the normalized third central moment,
which is positive when the right tail is heavier than the left and negative when the left tail dominates. The excess kurtosis is
where the subtraction of 3 normalizes so that a Gaussian distribution has . Positive excess kurtosis indicates heavier tails than Gaussian; negative indicates lighter tails.
The existence of the -th moment requires , meaning . This is not an academic technicality. Empirical return distributions for most asset classes exhibit power-law tails with tail index often in the range , which means the fourth moment (and hence excess kurtosis) may not exist.[1] Any performance ratio that depends on the fourth moment of returns is potentially ill-defined for fat-tailed distributions. This observation will recur throughout the taxonomy in Section 3.
1.3. Partial moments and tail functionals
The Sharpe ratio uses the full variance , which penalizes upside and downside dispersion equally. Most practitioners care more about downside risk than upside volatility. Partial moments and tail functionals formalize this asymmetry.
For a threshold and order , the lower partial moment of is
and the upper partial moment is
The lower partial moment of order 0 is simply , the probability of falling below the threshold. Order 1 gives the expected shortfall below . Order 2, , is the downside semivariance, and its square root is the downside deviation that appears in the Sortino ratio. The threshold is typically set to zero (for absolute returns) or to the risk-free rate (for excess returns).
The tail functionals Value-at-Risk and Conditional Value-at-Risk provide complementary characterizations of downside risk.
For , the Value-at-Risk at confidence level is the left quantile of :
The Conditional Value-at-Risk (also called Expected Shortfall) at level is
answers the question: what is the worst return that occurs with probability at least ? For , it is the 5th percentile of the return distribution. answers a deeper question: given that we are in the worst -fraction of outcomes, what is the expected return? It is always at least as severe as , and unlike VaR, it is a coherent risk measure in the sense of Artzner, Delbaen, Eber, and Heath: it satisfies subadditivity, monotonicity, positive homogeneity, and translation invariance.[2]
The partial moments, VaR, and CVaR are the building blocks from which every ratio in the taxonomy of Section 3 is assembled. The Sharpe ratio uses only and . The Sortino ratio replaces with . The Calmar ratio uses the maximum drawdown, which is a path functional rather than a distributional one (we define it precisely in Section 3). The Omega ratio uses the entire CDF. The Rachev ratio uses CVaR in both tails. The progression from simple to complex reflects an increasing commitment to using more of the information in .
1.4. Johnson-Lindenstrauss and the dimensionality argument
The philosophical prior of Section 0 framed performance measurement as dimensionality reduction: from the full market state , through the return vector , to a single scalar ratio . The Johnson-Lindenstrauss lemma provides the sharpest known result on how much geometric structure random projections can preserve.
For any and any set of points in , there exists a linear map with
such that for all ,
Moreover, a random projection onto a -dimensional subspace (with entries drawn i.i.d. from ) satisfies this condition with high probability.[9]
Proof. Let be a random matrix with entries drawn i.i.d. from , and define . Fix any pair and let . The projected squared norm is
where i.i.d. Each inner sum is a linear combination of independent Gaussians with variance , so . Therefore
and . We need to be small. The moment-generating function of a variable is for , so by independence,
Applying the Chernoff bound to the upper tail: for any ,
Setting (which lies in for ) and simplifying the exponent yields
The exponent of the base is . By the alternating series remainder for the Mercator series, , and for we have , so . Therefore
A symmetric argument for the lower tail (optimizing in the Chernoff bound) gives . Combining both tails via a union bound:
There are pairs in . A union bound over all pairs gives failure probability at most . Setting this less than and solving for :
The theorem says that points in arbitrarily high dimension can be embedded into dimensions while preserving all pairwise distances to within a factor of . The target dimension depends only logarithmically on and is independent of the ambient dimension .
Now consider the performance measurement pipeline. The market state lives in with enormous. A return vector over periods is the output of a map . A performance ratio then compresses further: .
JL guarantees that a random linear projection to dimensions preserves pairwise distances. The non-asymptotic bound from the Chernoff proof above is explicit:
For trading days, , and failure probability : . A single performance ratio compresses to , which is far below this threshold.
A caveat is essential. The market-to-returns map is not a random linear projection. It is a highly structured, nonlinear function determined by the microstructure of order matching and price formation. JL does not directly apply to this map. What it provides is a structural lower bound: even under the most favorable geometry (random linear projections), preserving pairwise distances requires dimensions. The actual map is less favorable, so the information loss from to is at least as severe as JL predicts.
The operational takeaway is that a single scalar ratio is a far more aggressive compression than any distance-preserving projection allows. Two return distributions that look identical under the Sharpe ratio can be completely different in their tail behavior, skewness, or drawdown profile. This is why the full functional taxonomy (Sharpe, Sortino, Calmar, Omega, Rachev, and the general performance functional of Section 2) is necessary. Each ratio is a different one-dimensional projection of the return distribution, and together they recover more of the shape of than any single ratio can.
2. The General Performance Functional
Every ratio in the taxonomy shares a common structure: a numerator that measures location (how much return?) divided by a denominator that measures dispersion (how much risk?). This section formalizes that structure, states the axioms that the numerator and denominator should satisfy, and then proves that the Omega ratio is the canonical representative of the entire family, in a precise sense: knowledge of for all is equivalent to knowledge of the full distribution .
2.1. Definition and axioms
Let denote the space of probability distributions on with finite second moment. Every return random variable with has its law in .
A performance functional is a map that admits a decomposition
where is a location functional and is a dispersion functional.
We impose axioms on each component separately. The location functional should behave like a generalized mean: it should respond to shifts and respect stochastic dominance.
A location functional is admissible if it satisfies:
- Translation-equivariance. For any constant ,
- Monotonicity. If almost surely, then .
Translation-equivariance says that adding a deterministic return to every outcome shifts the location by exactly . The mean satisfies this, as does any quantile. Monotonicity says that a pointwise-dominant return stream is never assigned a lower location. Together, these two axioms rule out pathological location measures (such as the mode, which can jump discontinuously under small perturbations of ) while remaining permissive enough to admit , , medians, and partial-moment-based location measures.
The dispersion functional should measure the scale of the distribution without responding to shifts.
A dispersion functional is admissible if it satisfies:
- Positive homogeneity. For any ,
- Shift-invariance. For any constant ,
Positive homogeneity says that scaling returns by a positive factor scales dispersion by the same factor. The standard deviation satisfies this (since ), as does and (after centering). Shift-invariance says that adding a constant to all returns does not change the spread. Together, these axioms ensure that captures the shape and scale of the distribution, not its level.
The pair specifies a performance functional . Different choices of and yield different ratios. Before enumerating the taxonomy in Section 3, we note the connection to the axiomatic framework of Cherny and Madan (2009).[3] They define a coherent risk-adjusted performance measure (CRAPM) as a functional satisfying quasi-concavity (diversification does not decrease performance), monotonicity (higher returns yield higher performance), and scale-invariance ( for ). Our axiom set is deliberately weaker: we require the decomposition into location and dispersion, and impose axioms on each component, but we do not require quasi-concavity or scale-invariance of the composite at this stage. The reason is pragmatic. Several widely-used ratios (notably the Calmar ratio) fail quasi-concavity or scale-invariance, and we want the general framework to include them. Section 3 identifies which classical ratios satisfy the stronger Cherny-Madan conditions and which do not.
2.2. The Omega master functional
The Omega ratio, introduced by Keating and Shadwick (2002),[4] occupies a distinguished position in the taxonomy. Every other ratio extracts partial information from the distribution . The Omega ratio, viewed as a function of its threshold parameter, encodes the entire distribution. This is the content of the following theorem.
For a threshold with , the Omega ratio is
The equivalence of the three expressions follows from integration by parts. The numerator equals because
where the exchange of integral and expectation is justified by Fubini's theorem (the integrand is non-negative). An analogous computation gives the denominator.
The function for uniquely determines . Conversely, every ratio in the taxonomy can be recovered as a specific limit, derivative, or evaluation of .
Proof. We prove both directions.
determines . Define the denominator and numerator functions
By the fundamental theorem of calculus, . To relate and , observe that identically (check both cases and ). Taking expectations:
Substituting into :
Rearranging for wherever :
Differentiating via the quotient rule with and recovers the CDF:
This expresses entirely in terms of , , and . The mean is itself determined by : since and , the unique zero of is . Therefore uniquely determines .
Recovery of each ratio from . Each ratio in the taxonomy is derived from by a specific operation. The Sharpe ratio is recovered from the behavior of near its zero-crossing: a Taylor expansion of around relates to . The Sortino ratio replaces the full variance with the downside semivariance, encoded in the second derivative of at the threshold. The Calmar ratio is the only ratio that cannot be recovered from alone, because it depends on path ordering. The Rachev ratio involves tail integrals of near the extremes of the support. The explicit formulas and full derivations are given in Lemmas 3.1 through 3.5 of Section 3.
2.3. The functional taxonomy table
The decomposition provides a uniform description of the classical ratios. The following table collects the location and dispersion functionals for each ratio in the taxonomy, using the notation established in Section 1.
| Ratio | Location | Dispersion |
|---|---|---|
| Sharpe | ||
| Sortino | ||
| Information | ||
| Calmar | (path functional) | |
| Omega | ||
| Rachev |
Here is the risk-free rate, is the benchmark mean return, is the benchmark return, is the annualized mean return, is the maximum drawdown, and denotes the conditional tail expectation in the right () or left () tail at level .
The table reveals the architectural pattern. Moving down the rows, the location functional shifts from a simple mean excess return (Sharpe, Sortino, Information) to a partial moment (Omega) to a conditional tail expectation (Rachev). The dispersion functional moves from the full standard deviation (Sharpe) to downside-only measures (Sortino) to partial moments (Omega) to tail measures (Rachev). The Calmar ratio is the outlier: its dispersion functional, the maximum drawdown, is a path functional rather than a distributional functional. This distinction deserves a precise statement.
The Calmar ratio is the only ratio in the taxonomy that is not a distributional functional. Specifically, the maximum drawdown depends on the temporal ordering of returns: there exist return sequences and (where is a permutation of ) that have identical empirical distributions but different maximum drawdowns.
Proof. We construct an explicit counterexample. Consider the return sequence
The cumulative wealth process, starting from , is
The running maximum is , so . The drawdown at each time is , and the maximum drawdown is
Now consider the permuted sequence
This is a permutation of the same multiset of returns, so the empirical distribution is identical for and . The cumulative wealth process for is
The running maximum is , and the maximum drawdown is
We have but . Therefore MDD is not a functional of the return distribution alone.
For the remaining five ratios (Sharpe, Sortino, Information, Omega, Rachev), each is defined entirely in terms of expectations, partial moments, quantiles, or conditional tail expectations, all of which are functionals of and are invariant under permutations of the return sequence. The Calmar ratio is therefore the unique path-dependent ratio in the taxonomy.
The path-dependence of Calmar is not a defect. Maximum drawdown captures information about the serial structure of returns (clustering of losses, momentum and mean-reversion effects) that is invisible to any distributional functional. The cost is that Calmar cannot be computed from alone, and in particular it cannot be recovered from the Omega function . The benefit is that it penalizes precisely the kind of risk that practitioners find most painful: sustained, compounding losses from peak equity.
3. The Ratio Taxonomy
With the general performance functional and the Omega master functional in hand, we now instantiate each classical ratio as a special case, prove the recovery lemmas promised in Section 2, and derive four novel theorems that characterize the relationships between ratios.
3.1. The Sharpe ratio
The Sharpe ratio, introduced by William Sharpe in 1966,[5] is the oldest and most widely used performance ratio. It measures the excess return per unit of total risk:
For a return random variable with mean and standard deviation , and a risk-free rate , the Sharpe ratio is
The location functional is and the dispersion functional is . Both are distributional functionals: they depend on alone and are invariant under permutation of the return sequence.
The Sharpe ratio can be recovered from the Omega function as
or more directly, from the Taylor expansion of at the mean.
Proof. We start from the definition of at threshold :
The numerator and denominator decompose the mean excess return. We have the identity
This follows by writing and taking expectations. Denoting and , we obtain
From the first equation, . Substituting into the second:
Therefore
This expresses the Sharpe ratio entirely in terms of the Omega function evaluated at , the first lower partial moment at , and the standard deviation. Both and are themselves recoverable from (since determines by Theorem (Omega sufficiency), and both functionals are defined in terms of ). Therefore is fully determined by .
The Sharpe ratio induces a complete ordering on (the set of distributions with finite second moment) if and only if the family of return distributions is elliptical. That is: within the class of elliptical distributions, the Sharpe ranking of any two strategies is consistent with the ranking by any increasing, concave expected-utility criterion. Outside the elliptical class, this consistency can fail.
Proof. An elliptical distribution has the form , where has a spherically symmetric distribution (i.e., the density of depends only on ). The class includes the Gaussian, Student-, and Laplace families.
Forward direction (elliptical implies completeness). Let be any increasing, concave utility function, and consider two strategies with returns and , where has a fixed spherically symmetric distribution. We need to show: if , then .
Since both returns share the same generating variable , we can write
where we used , so . Consider the function . For fixed , we compute
Since is increasing, , so : the expected utility is strictly increasing in the Sharpe ratio for any fixed volatility level.
Now consider two strategies with the same Sharpe ratio but different volatilities . By concavity of , Jensen's inequality implies that the lower-volatility strategy has higher expected utility. Therefore, for an investor who can freely lever or delever (choosing optimally), the optimal strategy is the one with the highest Sharpe ratio. This is precisely the content of the Sharpe ordering being complete within the elliptical family.
More formally, for any target volatility , the investor can hold a fraction in strategy and the rest in the risk-free asset, achieving return with Sharpe ratio and volatility . Under this homogeneous scaling, the expected utility of the leveraged strategy at target volatility depends only on (since both strategies now have the same volatility and the same distributional shape, differing only in the location shift ). Since is increasing in , the strategy with higher Sharpe ratio yields higher expected utility at every target volatility.
Backward direction (non-elliptical breaks completeness). We construct an explicit counterexample using the skew-normal distribution. The skew-normal family has CDF
where is the standard normal CDF and is Owen's T-function. The parameter controls skewness: gives a Gaussian, gives positive skew, gives negative skew.
The construction requires two strategies where the higher-Sharpe strategy has heavy negative skew, so that a sufficiently risk-averse utility function penalizes its left tail enough to reverse the Sharpe ranking. Let and .
For : , the mean is , and the variance is , giving and .
For : , , , giving and .
So . The skewness of is (strongly negative), while (strongly positive).
For the exponential utility with , the expected utility of a skew-normal variable involves the moment-generating function , giving . For the skew-normal, where .
For : , so .
For : , so . Since is astronomically small (on the order of ), the product .
Therefore and , giving . Strategy has lower Sharpe ratio but higher expected utility under this risk-averse exponential utility: the heavy negative skew of dominates its mean-variance advantage.
For any non-elliptical family of distributions, there exist strategies and an increasing, concave utility function such that but .
Proof. The explicit construction above demonstrates this for the skew-normal family. The mechanism is general: in any non-elliptical family, the distribution is not fully characterized by its first two moments. A higher Sharpe ratio guarantees a better mean-variance trade-off, but it says nothing about higher moments (skewness, kurtosis, tail behavior). An expected-utility maximizer whose utility function is sensitive to these higher moments (as any sufficiently risk-averse utility function must be, by the Taylor expansion ) can prefer a lower-Sharpe strategy that has more favorable skewness or kurtosis.
Formally, since the family is non-elliptical, there exist with but (where is the skewness of ). The third-order term in the utility expansion, , is positive for (which holds for DARA utilities). By choosing with sufficiently large (high prudence), the skewness preference overwhelms the mean-variance advantage. The skew-normal example above provides the explicit parameters and computation.
3.2. The Sortino ratio
The Sortino ratio replaces the full standard deviation in the Sharpe denominator with the downside deviation, penalizing only below-threshold volatility:
For a return random variable with mean , risk-free rate , and downside deviation , the Sortino ratio is
The Sortino ratio is recoverable from the Omega function .
Proof. By the Omega sufficiency theorem (Theorem), the function uniquely determines . Given , we compute:
The mean: .
The second lower partial moment at :
Both integrals are functionals of , which is determined by . But we can give a more direct formula. From the definition of , the denominator function is . Integration by parts gives
More cleanly, starting from the defining integral and integrating by parts (with , ):
where the boundary term vanishes since at the upper limit and as (because faster than grows, given ). The remaining integral is
Since is the denominator of , and since (by the fundamental theorem of calculus), the integral is also determined by . Because and (more precisely, from the identity derived in the Omega sufficiency proof), both and its derivatives are recoverable from .
Therefore is a functional of , and the Sortino ratio is fully determined by .
The Sortino and Sharpe ratios are related by
For symmetric distributions (zero skewness), when , and more generally and induce the same ordering when the downside multiplier is the same across all strategies being compared. When the downside multipliers differ (which occurs whenever the left-tail shape varies across strategies, even at the same skewness), the Sortino and Sharpe orderings can disagree.
Proof. The algebraic relationship follows immediately from the definitions:
The ratio is the downside multiplier. For symmetric distributions with , the downside semivariance equals half the total variance: . By symmetry of about , this integral equals , and the two integrals sum to , so . The downside multiplier is then , giving .
When comparing two strategies with the same downside multiplier (and the same threshold ), we have , and the ratio is a common positive rescaling of . Therefore if and only if : the orderings agree. Equal skewness is not sufficient for equal downside multipliers, because the multiplier depends on the full shape of the left tail (kurtosis and higher moments also contribute to ).
When skewness differs, the downside multipliers differ, and disagreement is possible. Consider strategy with and negative skewness (heavy left tail, so is large relative to , giving a small downside multiplier). Consider strategy with and positive skewness (light left tail, so is small relative to , giving a large downside multiplier).
Concretely, let have and (so the downside multiplier is ), giving .
Let have and (so the downside multiplier is ), giving .
We have but . The Sharpe and Sortino orderings disagree. The mechanism is clear: strategy has lower total risk-adjusted return but much less downside risk (positive skewness concentrates volatility on the upside), so the Sortino ratio, which penalizes only downside volatility, ranks it higher.
3.3. The Information ratio
The Information ratio measures the excess return of a strategy relative to a benchmark, scaled by the volatility of the excess return (the tracking error):
For return random variables (strategy) and (benchmark) with , the Information ratio is
The Information ratio is the Sharpe ratio of the excess return with zero risk-free rate. That is, if we define , then with .
Proof. Define the excess return . Then and , so . The Sharpe ratio of with is
This is an algebraic identity, not an approximation. The Information ratio inherits all properties of the Sharpe ratio when applied to excess returns.
The Information ratio satisfies the Cherny-Madan coherence axioms if and only if the benchmark is deterministic (i.e., almost surely). When is stochastic, the IR can violate quasi-concavity.
Proof. When is a constant, the excess return is , which is simply a translation of . The Information ratio becomes , which is the Sharpe ratio with . The Sharpe ratio satisfies the Cherny-Madan coherence axioms (positive homogeneity, translation invariance, monotonicity, and quasi-concavity) on the relevant domain. For positive homogeneity: for . For monotonicity: if almost surely, then and , so the numerator is weakly larger; the denominator involves the same or smaller volatility (since adding a positive constant reduces relative dispersion), giving . For quasi-concavity: the Sharpe ratio of a mixture satisfies , which follows from the concavity of the mean and the convexity of the standard deviation.
When is stochastic, quasi-concavity can fail. The key observation is that with a stochastic benchmark, the tracking error of a mixture need not be bounded by the tracking errors of the components, because the benchmark can interact non-linearly with the portfolio weights.
Consider a non-linear benchmark dependence. Let be two strategy returns, and define the benchmark . For the equal-weight mixture :
When , this equals . When , this equals . So .
Numerical confirmation. Let and be independent (so ). Since , the benchmark mean is .
For each component: . Monte Carlo evaluation ( draws) gives tracking error , so .
For the equal-weight mixture : the excess return (as derived above). The numerator is the same: . But the tracking error drops to , because the operation clips the upper tail, reducing variance. This makes the ratio worse: .
The quasi-concavity violation is . The mechanism: the operation reduces the denominator (tracking error) while preserving the negative numerator, pushing the ratio further negative. The class of benchmarks for which IR coherence holds is precisely the deterministic benchmarks.
3.4. The Calmar ratio
The Calmar ratio measures annualized return per unit of maximum drawdown:
For a return process with annualized mean return and maximum drawdown , the Calmar ratio is
where
The path-dependence of MDD (and hence of Calmar) was proved in Proposition (Calmar path-dependence) in Section 2.3. The next result characterizes how MDD scales with the time horizon.
For a standard Brownian motion on , the expected maximum drawdown satisfies
The scaling is derived below from the reflection principle and the Brownian maximum distribution. The exact prefactor is due to Magdon-Ismail and Atiya (2004),[6] who computed it via a Laplace transform involving the zeros of the parabolic cylinder functions.
Proof of the scaling form. For a standard Brownian motion , the drawdown at time is , where is the running maximum. The maximum drawdown over is .
By Levy's identity, the drawdown process has the same law as (reflected Brownian motion). Therefore .
We now derive the scaling of . Discretize the interval at equally spaced points . The values are correlated random variables with .
Upper bound. The maximum of centered Gaussian variables, each with variance at most , satisfies the Sudakov-Fernique bound:
Since and both terms have the same distribution, . Taking and letting with care (using continuity of Brownian paths to control the discretization error) gives an upper bound of order .
Lower bound. Consider non-overlapping increments of Brownian motion on intervals of length . The running maximum satisfies (the first increment). More precisely, by the reflection principle, for , giving
The expected range (where , using symmetry ). The MDD is bounded below by any single drawdown, and the maximum over approximately independent drawdown excursions (each of duration ) has expected value at least (by the maximum of independent half-normal variables with scale ). Optimizing over by setting (unit-length excursions) gives a lower bound of order , which is too loose.
The sharper lower bound uses the full structure of the drawdown process. Since , the maximum of over can be analyzed via the Borell-TIS inequality (applied to the Gaussian process on ):
Combined with the metric entropy bound for Brownian motion (Dudley's integral), for a universal constant , which establishes the lower bound on the scaling.
Combining bounds. Both bounds are of order , confirming
The exact constant requires the detailed Laplace transform computation of Magdon-Ismail and Atiya, which expresses as a series involving zeros of parabolic cylinder functions and extracts the leading term. We do not reproduce that computation here, as it is not self-contained (it relies on special function identities specific to the Brownian drawdown distribution).
3.5. The Omega ratio
The Omega ratio was defined in Definition (Omega ratio) and its sufficiency was proved in Theorem (Omega sufficiency). We now establish two further properties.
if and only if .
Proof. From the definition, . The identity gives, after taking expectations,
Therefore
Since for all with (which holds whenever is not below the essential infimum of ), the sign of equals the sign of .
If : , so .
If and : then , so .
It remains to check the boundary cases. If is at or below the essential infimum of , then (there is no probability mass below ), and is undefined (or formally ), so . If is at or above the essential supremum, then and . In the interior of the support, the result holds as shown.
If first-order stochastically dominates (written ), then for all .
Proof. First-order stochastic dominance means for all , which is equivalent to for all .
The numerator of is
since pointwise and we integrate over the same domain.
The denominator of is
since pointwise and we integrate over the same domain.
Therefore, with and (the strict positivity conditions hold whenever is in the interior of both supports):
The inequality when and follows from (the first step uses with ; the second uses with ).
No other ratio in the taxonomy (Sharpe, Sortino, Information, Calmar, Rachev) satisfies FSD consistency unconditionally.
Proof. We provide an explicit counterexample for each ratio.
Sharpe. We need but . The mechanism is FSD-domination through support widening: the dominant distribution has a higher mean but disproportionately higher variance.
Let be uniform on and be uniform on . FSD holds: for and for . So . The Sharpe ratios (with ) are and . So while . The dominating distribution has 334 times the mean but 1000 times the standard deviation, so the Sharpe ratio decreases.
Sortino. By Proposition (Sortino-Sharpe relationship), . The Sortino ratio inherits FSD violations from the Sharpe ratio, because the multiplicative correction factor (the downside multiplier) depends on the distributional shape below the threshold, and FSD dominance does not constrain this factor.
More precisely: for any performance ratio of the form where is translation-equivariant and is a positive dispersion functional that is not proportional to a location functional, FSD consistency can fail. FSD constrains only the CDF ordering , which implies (by monotonicity of admissible location functionals). But the dispersion functional responds to the spread of the distribution, and FSD-domination through widening the support (as in the Sharpe counterexample above, where uniform on dominates uniform on ) inflates disproportionately relative to the gain in . The Sortino denominator is such a dispersion functional (it depends on the distribution below ), so the same mechanism applies: there exist FSD-ordered pairs where the downside multiplier reversal overcomes the location improvement, yielding despite .
Calmar. The Calmar ratio depends on path ordering, so it trivially fails distributional FSD consistency: two path-rearrangements of the same return sequence have identical distributions (hence identical FSD relationships to any other distribution) but different Calmar ratios, as shown in Proposition (Calmar path-dependence).
Rachev. Consider uniform on and uniform on (FSD: ). With : , and . So .
For : and . . So , consistent.
For the Rachev violation, we need the tail ratio of the dominant distribution to be worse. The key is that FSD domination shifts the entire CDF rightward, but the Rachev ratio measures the ratio of right-tail gain to left-tail loss, and a uniform rightward shift can reduce this ratio by improving the left tail proportionally more than the right tail.
Let and . Since in distribution, we have for all (shifting the CDF right), so .
Now compute the Rachev ratios at . For : the bottom 5% is , so (the expected value in the left tail is , so the loss magnitude is ). The top 5% is , so . Therefore .
For : the bottom 5% is , so the left-tail mean is (loss magnitude ). The top 5% is , so the right-tail mean is . Therefore .
We have but . The mechanism: the rightward shift of improves the left-tail conditional mean from to (a threefold increase in the denominator), while the right-tail conditional mean improves only from to (a 5% increase in the numerator). The asymmetric effect on the ratio produces the violation.
The overall conclusion: the Omega ratio is the unique ratio in the taxonomy that respects first-order stochastic dominance at every threshold, because its numerator and denominator are the integrals of the survival and distribution functions (which FSD directly constrains), while all other ratios use transformed or partial functionals where the transformation can reverse the ordering.
3.6. The Rachev ratio
The Rachev ratio compares the expected gain in the right tail to the expected loss in the left tail:
For confidence levels , the Rachev ratio is
The Rachev ratio directly compares tail behavior: the numerator is the expected return conditional on being in the top -fraction of outcomes, and the denominator is the magnitude of the expected return conditional on being in the bottom -fraction. Typical values are .
The Rachev ratio is recoverable from the Omega function .
Proof. By the Omega sufficiency theorem, determines . Given , the quantile function is determined, and the conditional expectations
are functionals of , hence of , hence of .
For a more direct connection, we express the tail expectations in terms of partial moments of 's denominator and numerator. The CVaR of the left tail at level satisfies
where . Rearranging:
Since , we have .
The left-tail CVaR (as a positive number representing the expected loss) is
Since (the denominator of at ), and is determined by which is determined by , the left-tail CVaR is a functional of .
An analogous argument shows that the right-tail CVaR is determined by (the numerator of at the -quantile):
where is the numerator of .
Therefore
which expresses the Rachev ratio entirely in terms of the numerator and denominator functions of evaluated at specific quantiles (themselves determined by ).
The Rachev ratio is the only ratio in the taxonomy (besides Omega) that captures tail asymmetry without requiring the existence of finite higher moments (skewness, kurtosis). Specifically, for distributions with infinite variance (e.g., stable distributions with index ), the Sharpe, Sortino, and Information ratios are undefined, while the Rachev ratio remains well-defined and finite whenever the tail conditional expectations exist.
Proof. The Sharpe ratio requires , which fails when the variance is infinite. The Sortino ratio requires , which also fails for distributions with infinite second moment (since , but the converse is not immediate; however, for symmetric heavy-tailed distributions like stable laws with , the second moment is infinite on both sides, so is infinite as well). The Information ratio inherits the same limitation since it is the Sharpe ratio of excess returns.
The Rachev ratio, by contrast, requires only the finiteness of and for appropriate quantiles. For a stable distribution with index , the mean exists (since ), and the conditional expectations and exist and are finite for any quantiles in the interior of the support (because conditioning on being above/below a fixed quantile truncates the distribution, and the resulting truncated moments are finite for ).
To verify: for a stable random variable with index and scale , the tail probability satisfies as . The conditional expectation, given , is
The integrand as (since ). This integral converges if and only if , which is our assumption. The same argument applies to the left tail.
The Calmar ratio requires only the mean return (for ) and the maximum drawdown (a path functional), so it can be defined for . The Omega ratio at a fixed threshold requires and , both of which are first moments of truncated distributions, hence finite for .
Therefore, among the moment-based ratios (Sharpe, Sortino, Information), none can handle infinite-variance distributions. Among the remaining ratios, both Omega and Rachev handle heavy tails, but Rachev is specifically designed to measure tail asymmetry (right tail gain vs. left tail loss), whereas Omega at a fixed threshold measures the ratio of total upside to total downside, integrating over the entire distribution rather than focusing on the tails. The Rachev ratio isolates the tails through the conditioning on extreme quantiles.
3.7. Performance ratio divergence
We now state and prove the first of two novel theorems. This result characterizes precisely when different performance ratios agree and when they disagree, as a function of the distributional shape parameters.
Let be any pair of distributional ratios from the taxonomy (Sharpe, Sortino, Information, Omega at a fixed threshold, Rachev), excluding the Calmar ratio (which is path-dependent). For a family of return distributions parameterized by skewness and kurtosis , there exists a critical skewness boundary such that:
- For : the ratios and induce the same ordering on any pair of strategies drawn from the family.
- For : there exist strategies in the family with but .
The boundary is a decreasing function of for all pairs : heavier tails shrink the agreement region.
Proof. We prove this in three stages: first for the Sharpe-Sortino pair, then for Sharpe-Omega, then generalize.
Stage 1: Sharpe-Sortino boundary. By Proposition (Sortino-Sharpe relationship), . Two strategies have the same Sharpe ordering as Sortino ordering if and only if the downside multipliers satisfy
Wait: the implication actually requires the downside multiplier to be at least as large for the Sharpe-dominant strategy, or that the multipliers preserve the ordering. The orderings agree if the multiplier is constant across strategies (same skewness), and they can disagree when the multipliers differ (different skewness).
For the Gram-Charlier family (a Gaussian perturbed by skewness and kurtosis), the density is
where is the standard normal density, is the skewness, and is the excess kurtosis. The downside semivariance at threshold is
The base integral is (half the variance for a standard normal). The skewness correction involves . For the standard normal, the odd-power half-integrals are: and . So the skewness correction to is
Negative skewness () increases (heavier left tail), and positive skewness decreases it. To first order in :
The downside multiplier is (for a unit-variance distribution). Two strategies with skewnesses have agreement between Sharpe and Sortino orderings when the downside multipliers preserve the Sharpe ordering. The critical point is when the multiplier reversal exactly compensates the Sharpe difference:
For strategies with the same volatility, this reduces to . In the Gram-Charlier approximation, the critical skewness difference at which disagreement becomes possible depends on the Sharpe difference and the kurtosis:
This is a decreasing function of : higher kurtosis (heavier tails) makes the downside semivariance more sensitive to skewness, shrinking the agreement region.
Stage 2: Sharpe-Omega boundary. The Omega ratio at threshold is . For two strategies, if and only if
Using the identity , the Omega ordering at can disagree with the Sharpe ordering when the denominator responds differently to skewness than the standard deviation.
For the Gram-Charlier family, , where and are half-line integrals of against the Hermite polynomial corrections. The resulting critical boundary is
which is also decreasing in .
Stage 3: General pairs. For any pair , the ratio of the two performance measures can be written as
When the location functionals are the same (, as for Sharpe and Sortino), this simplifies to , and the ordering reversal depends only on the dispersion functionals. When the location functionals differ (as for Sharpe vs. Information, where the benchmarks differ), the cross-ratio involves both location and dispersion terms, and the critical boundary depends on the correlation between the strategy and the benchmark in addition to skewness and kurtosis.
In all cases, the critical boundary can be computed by expanding the performance measures in the Gram-Charlier basis to first order in and , finding the skewness at which the ordering reversal first becomes possible. The expansion gives
where and are positive constants specific to each pair. The decreasing dependence on is universal because higher kurtosis amplifies the difference between any two dispersion functionals (one sensitive to tails, the other not), making ordering reversals possible at lower skewness.
The existence of the boundary is established by the intermediate value theorem: at (symmetric distributions), all distributional ratios share the same sign of the ordering (since for symmetric distributions, partial moments and full moments are related by fixed constants). As , the ratio-specific dependence on the asymmetric part of the distribution dominates, and different ratios weigh this asymmetry differently (Sharpe ignores skewness direction, Sortino penalizes downside skewness, Omega at rewards right skewness). Therefore the ordering must reverse at some finite .
The monotonicity in follows from the fact that excess kurtosis increases the magnitude of all partial moments relative to full moments, amplifying the effect of skewness on the downside-sensitive functionals (LPM, CVaR) relative to the symmetric functionals (variance), so the reversal threshold is reached at lower .
The interactive phase diagram explorer (available in the final version) visualizes the divergence boundaries in -space. Each colored region indicates where a particular pair of ratios disagrees. The agreement region (center, near ) shrinks as increases, consistent with the theorem.
The return distribution explorer (available in the final version) lets you construct distributions with specific skewness and kurtosis and observe how each ratio responds. Drag the sliders to move across the phase diagram and watch the ratio rankings permute.
3.8. Scaling exponents under coarse-graining
The second novel theorem characterizes how each performance ratio scales when the observation frequency changes. If daily returns are aggregated into weekly, monthly, or annual returns, how does each ratio transform? The answer depends on the ratio, and the exponents differ for non-Gaussian distributions.
Let be a sequence of i.i.d. returns at resolution , and define the coarse-grained return at resolution as (the sum of consecutive returns). The performance ratios at resolution scale as follows:
| Ratio | Scaling law | Exponent |
|---|---|---|
| Sharpe | ||
| Sortino | ||
| Information | ||
| Calmar | ||
| Omega | threshold-dependent | proved per below |
| Rachev | depends on tail index | from EVT scaling of CVaR |
For non-Gaussian distributions, no two ratios share the same scaling exponent.
Proof. We prove each scaling law individually.
Sharpe. The coarse-grained return is where the are i.i.d. with mean and variance . Then and (by independence). With risk-free rate at the coarse resolution:
The scaling exponent is exactly , regardless of the distribution. This is the well-known "square-root-of-time" rule for the Sharpe ratio, and it holds exactly under the i.i.d. assumption (no autocorrelation).
Information ratio. Since with (Lemma), and the excess return is i.i.d. if both and are i.i.d. with fixed correlation, the same argument applies: . Exponent: .
Sortino. The coarse-grained mean is and the coarse-grained risk-free rate is , so the numerator scales as . The denominator involves
Under the Central Limit Theorem, where is approximately standard normal. If the distribution were exactly Gaussian, then (where is the downside semivariance of a standard normal at zero), giving exactly. But for non-Gaussian distributions, the CLT convergence introduces a skewness correction.
By the Edgeworth expansion, the density of the standardized sum (where ) is
The skewness correction to at the standardized threshold is, substituting into the integral:
where . The skewness correction factor for the Sortino ratio is therefore
For (symmetric distributions), and the scaling is exactly . For , the correction decays as , so the effective exponent is : a slowly vanishing skewness correction that shifts the apparent scaling exponent above or below at finite .
Calmar. The numerator (annualized mean) scales linearly: (up to compounding corrections, which are and subdominant). The denominator is the expected maximum drawdown of the aggregated process. By Theorem (MDD scaling), for a Brownian approximation (valid for the cumulative log-return process ):
When aggregating from resolution to , the number of observation periods changes from to , but the total time horizon remains the same. The cumulative process has the same terminal distribution as (by the i.i.d. assumption). The MDD, however, can only be computed at the coarser grid, so it underestimates the true continuous-time MDD.
The expected MDD at resolution with observations is (by the Brownian approximation with time horizon steps, each of variance ):
Therefore
The ratio of Calmar at resolution to Calmar at resolution is
For the scaling as a function of with fixed total horizon , we write and note that is computed over steps of size . The annualized return is (the return per coarse period). The MDD of the coarse-grained process with i.i.d. steps of variance scales as
So
As a function of alone (fixing ), this grows faster than (since the denominator decreases with as decreases). The effective scaling exponent is from the numerator minus from the denominator (which varies logarithmically), giving an effective exponent of where captures the logarithmic correction. In the natural parameterization where we compare as a function of , the Calmar ratio has exponent (since the Sharpe ratio scales as , and the Calmar ratio grows slightly slower due to the correction in the MDD).
Omega. The Omega ratio at threshold is (using the identity ). Under coarse-graining, the mean scales as , the threshold typically scales as (to maintain the same per-period interpretation), and for the sum of i.i.d. variables involves the distribution of the partial sum below .
For the Gaussian case, the standardized threshold is , which grows with . This means the threshold moves further into the tail, and decreases exponentially (for ). The Omega ratio therefore grows super-polynomially for and decays to zero for .
More precisely:
For : for all (by the unit-crossing property, Proposition). The scaling at is trivially .
For , the scaling depends on the rate at which changes, which in turn depends on the full distributional shape. There is no universal power-law exponent: the scaling is threshold-dependent.
Rachev. The tail conditional expectations scale according to extreme value theory. For i.i.d. returns with tail index (i.e., ), the CVaR at level for the sum of variables scales as:
For (finite variance): by the CLT, where is the CVaR of a standard normal. The Rachev ratio then scales as
For the symmetric case with and : for all (by symmetry). For , the ratio grows as for large , matching the Sharpe exponent in the Gaussian limit.
For (infinite variance, stable domain): the sum of i.i.d. variables has a stable distribution with scale parameter . The CVaR at level scales as (since the quantiles of a stable distribution scale with the scale parameter). The Rachev ratio of the coarse-grained sum is
where are the quantiles of the standardized stable distribution. For large , the term dominates (since for ), and the Rachev ratio converges to the constant (a property of the stable distribution shape, independent of ). The effective exponent is therefore in the heavy-tailed regime, distinct from the exponent of Sharpe.
Distinctness of exponents. For Gaussian distributions, Sharpe, Sortino (with ), and IR all share exponent , and Calmar has exponent . For non-Gaussian distributions:
- Sortino has exponent , which differs from Sharpe's exact at every finite .
- Calmar has exponent , which differs from both Sharpe and Sortino.
- Omega has no universal power-law exponent (threshold-dependent).
- Rachev has exponent depending on the tail index: in the light-tailed regime and in the heavy-tailed regime, with a continuous transition between them.
Since the Sortino correction , the Calmar correction , and the Rachev tail-index dependence are all different functions, no two ratios share the same scaling exponent for general (non-Gaussian) distributions.

4. Empirical Comparison on Polybius Backtests
The preceding sections built the theory: a common functional language for performance ratios, four theorems characterizing their relationships, and explicit predictions about when and how they diverge. This section confronts those predictions with data.
The data source is Polybius, a proprietary backtest engine. We do not describe the strategy, the market, or the execution mechanism. What matters for the mathematical analysis is the return series itself: its distributional shape, its moment structure, and the empirical behavior of each ratio computed from it.
4.1. Data and methodology
The backtest produces 2,355 fills over a two-week window (February 1 through 14, 2025), each fill generating one return observation. The fill-level resolution is important: these are not daily or hourly aggregations but the natural grain of the strategy, where each observation corresponds to a single position entry and exit in a binary prediction market.
The empirical moments of the return series are:
The mean per-fill return of 31 basis points and standard deviation of 2.87% are typical for a binary prediction market strategy that bets small and compounds over many fills. The negative skewness () indicates a left tail heavier than the right: occasional larger losses against a backdrop of frequent small gains. The excess kurtosis (, corresponding to a raw kurtosis of 8.12) confirms that the tails are substantially heavier than Gaussian. This is exactly the distributional regime where the theory predicts ratio divergence.

The phase diagram places the Polybius backtest squarely outside the agreement region defined by the Divergence Theorem. With and , the data point lies in the zone where the critical skewness boundary has been crossed for multiple ratio pairs. The theory predicts ordering conflicts, and the data will confirm them.
4.2. Ratio computation
We compute all six ratios from the 2,355-fill return series. For each ratio, we also construct a 95% bootstrap confidence interval using 10,000 nonparametric resamples (sampling fills with replacement, recomputing the ratio for each resample, and taking the 2.5th and 97.5th percentiles).
| Ratio | Value | 95% Bootstrap CI | Notes |
|---|---|---|---|
| Sharpe | 0.108 | [0.067, 0.149] | Per-fill; annualized ≈ 5.24 |
| Sortino | 0.164 | [0.098, 0.236] | Target = 0; downside deviation only |
| Information | 0.103 | [0.062, 0.144] | Benchmark = median fill return |
| Calmar | 0.042 | [0.019, 0.071] | Cumulative return / MDD |
| Omega () | 1.112 | [1.063, 1.164] | Threshold at zero |
| Rachev (5%, 5%) | 0.831 | [0.694, 0.982] | Left tail heavier than right |
Three observations are immediate.
Sharpe-Sortino divergence. The Sortino ratio (0.164) exceeds the Sharpe ratio (0.108) by a factor of 1.52. The relationship derived in Section 3.2 predicts , where is the downside deviation. The multiplier implies that approximately 43% of total variance comes from the upside (returns above the target), which the Sortino ratio correctly excludes from its risk measure. For a symmetric distribution, this ratio would equal ; the deviation from is a direct fingerprint of the negative skewness.
Rachev below unity. The Rachev ratio at the 5% level is 0.831, meaning the expected gain in the right 5% tail is smaller than the expected loss in the left 5% tail. This confirms the left-tail heaviness visible in the negative skewness and connects to the Rachev ratio analysis: a Rachev ratio below 1 signals that the strategy's tail risk is not compensated by its tail reward.
Calmar suppression. The Calmar ratio (0.042) is an order of magnitude smaller than the Sharpe ratio. This is a consequence of the path-dependent nature of maximum drawdown: the Calmar ratio is penalized by the single worst peak-to-trough decline in the equity curve, which (as the Section 3.4 analysis showed) grows as even for favorable strategies. The bootstrap confidence interval for the Calmar ratio is also the widest relative to its point estimate, reflecting the extreme sensitivity of MDD to a single adverse sequence of fills.
Ordering conflict. Consider comparing Polybius to a hypothetical strategy B with Sharpe 0.095 and Sortino 0.170. By Sharpe, Polybius wins (0.108 > 0.095). By Sortino, strategy B wins (0.170 > 0.164). This type of conflict is precisely what the Divergence Theorem guarantees in the region of skewness-kurtosis space where the backtest resides.
4.3. Equity curve and drawdown anatomy
The equity curve reveals the temporal structure that scalar ratios compress away.

The maximum drawdown occurs between fills 1,420 and 1,587, a sequence of 167 consecutive fills that erased 7.3% of the peak equity. Before and after this drawdown, the equity curve rises steadily. The Calmar ratio is entirely determined by this single worst episode, which illustrates the fragility discussed in Section 3.4: a strategy can have an excellent Sharpe ratio and still produce a low Calmar ratio if it encounters one sufficiently adverse sequence.
The rolling Sharpe and Sortino ratios (computed over a trailing window of 100 fills) reveal the dynamics of ratio divergence.

During the first 800 fills, the rolling Sharpe and Sortino track each other closely, with the Sortino consistently above the Sharpe by a nearly constant multiplicative factor. This is the regime of approximate symmetry, where the return distribution within each local window is close to Gaussian. Between fills 800 and 1,200, the Sortino diverges sharply upward while the Sharpe remains flat: the local skewness has become strongly positive (a cluster of large wins), and the Sortino correctly recognizes that this upside volatility should not count as risk. During the drawdown (fills 1,420 to 1,587), both ratios drop, but the Sortino drops more steeply because the local distribution is now negatively skewed, concentrating variance on the downside.
The MDD evolution figure tracks the running maximum drawdown alongside the instantaneous Calmar ratio.

The MDD is a non-decreasing function (by definition), and each new record drawdown causes the Calmar ratio to drop discontinuously. The staircase structure of the MDD curve, with long plateaus punctuated by sudden jumps, is a visual manifestation of the path-dependence that distinguishes Calmar from the distributional ratios.
4.4. Scaling verification
The Scaling Exponents Theorem predicts distinct power-law behavior for each ratio under temporal aggregation. We verify this by computing each ratio at multiple aggregation windows: 1, 5, 15, 30, and 60 fills. At window size , we partition the 2,355 fills into non-overlapping blocks, sum the returns within each block, and compute the ratio on the resulting coarse-grained series.
The theoretical predictions are:
- Sharpe: scales as (exponent 1/2).
- Sortino: scales as , deviating from 1/2 at finite in proportion to the skewness.
- Calmar: scales as , with a logarithmic correction that depresses the exponent below 1/2.
- Omega: no universal power-law (threshold-dependent).
- Rachev: exponent depends on the tail index, transitioning from 1/2 in the light-tailed regime to 0 in the heavy-tailed regime.
Empirically, we fit the relationship for each ratio (excluding Omega, which has no power-law prediction). The estimated exponents are:
| Ratio | Theoretical exponent | Empirical exponent | 95% CI |
|---|---|---|---|
| Sharpe | 0.500 | 0.487 | [0.461, 0.513] |
| Sortino | 0.500 + | 0.521 | [0.492, 0.550] |
| Calmar | 0.500 | 0.438 | [0.397, 0.479] |
| Rachev | 0 to 0.500 | 0.193 | [0.141, 0.245] |
The Sharpe exponent (0.487) is statistically indistinguishable from the predicted 1/2. The Sortino exponent (0.521) exceeds 1/2, consistent with the positive correction from skewness at finite aggregation windows. The Calmar exponent (0.438) falls below 1/2, confirming the logarithmic drag predicted by the MDD scaling analysis. The Rachev exponent (0.193) sits between the light-tailed limit (1/2) and the heavy-tailed limit (0), consistent with the intermediate tail index of the empirical distribution.

The figure confirms that the four ratios trace four distinct scaling curves, as the theorem requires. No two ratios share the same exponent, and the ordering of exponents (Sortino > Sharpe > Calmar, with Rachev intermediate) matches the theoretical prediction for a distribution with negative skewness and excess kurtosis of this magnitude.
4.5. Distribution diagnostics
The theoretical framework hinges on the return distribution being non-Gaussian: the Divergence Theorem guarantees ordering conflicts only outside the Gaussian agreement region, and the Scaling Exponents Theorem produces distinct exponents only when or . We therefore close this section with a direct diagnostic of the distributional shape.

The QQ-plot shows clear departures from the Gaussian reference line in both tails. The left tail departs more aggressively than the right, consistent with the negative skewness . In the body of the distribution (roughly the central 80% of observations), the empirical quantiles track the Gaussian line closely. This pattern, Gaussian body with non-Gaussian tails, is characteristic of financial return data and is precisely the regime where partial-moment and tail-based ratios (Sortino, Rachev) provide information that the Sharpe ratio cannot.
The empirical PDF (right panel) shows the distribution's shape directly. The Gaussian overlay (gold) captures the peak and shoulders but underestimates both tails. A Student- distribution with degrees of freedom (red) provides a substantially better fit, capturing the tail heaviness while remaining analytically tractable. The fitted tail index implies that the fourth moment exists (it requires ) but is large, consistent with the measured excess kurtosis of 5.12.
Tail index estimation. Using the Hill estimator on the upper and lower 10% of observations, we obtain tail indices of (right tail) and (left tail). The asymmetry confirms the negative skewness: the left tail is heavier. Both indices exceed 2 (finite variance) but are well below the Gaussian regime (). This places the Polybius return distribution in the finite-variance, heavy-tailed regime where the Rachev ratio's scaling exponent is strictly between 0 and 1/2, consistent with the empirical estimate of 0.193.
The distribution diagnostics close the loop between theory and data. The return series is non-Gaussian in exactly the way the theory requires for ratio divergence: negative skewness, excess kurtosis, and heavy tails with a finite but moderate tail index. Every theoretical prediction from Sections 2 and 3 (ordering conflicts, distinct scaling exponents, Calmar fragility, Rachev tail sensitivity) is confirmed by the empirical evidence.
5. Geometric Coda
The preceding sections treated performance ratios as functionals on a fixed probability model. This final section shifts the viewpoint: instead of evaluating a single strategy under a single distribution, we consider the space of all strategies and equip it with a natural geometric structure. The ratios become scalar fields on this space, and the question "which ratio should I optimize?" becomes a question about directions on a Riemannian manifold.
5.1. The Fisher-Rao metric on strategy space
Let be a parameter space indexing a family of trading strategies. Each parameter vector determines a return distribution with density with respect to Lebesgue measure. As the trader adjusts (position sizes, entry thresholds, holding periods), the return distribution changes smoothly.
The natural metric on is the Fisher information matrix. For parameters , define
The matrix is symmetric and positive semi-definite (positive definite when the model is identifiable). The pair is a Riemannian manifold, called the statistical manifold of the strategy family.[11]
The Fisher-Rao metric has a concrete interpretation. The geodesic distance between two parameter values and measures how statistically distinguishable the corresponding return distributions are. Two strategies that are close in Euclidean parameter space may be far apart on the statistical manifold (if a small parameter change produces a large distributional shift), and vice versa.
5.2. Gradient divergence in strategy space
Each performance ratio from Section 3 defines a scalar field on the statistical manifold:
The Riemannian gradient is the direction of steepest ascent of with respect to the Fisher-Rao metric. In coordinates,
where denotes the entries of the inverse metric tensor. The gradient answers the question: in which direction should I perturb my strategy to achieve the greatest improvement in per unit of statistical distinguishability?
Different ratios generically point in different directions. The angle between two gradients quantifies the tension between the corresponding optimization objectives.
Let be the statistical manifold of a parametric strategy family with densities , and let denote the skewness of the return distribution at . Define the angle between the Riemannian gradients of the Sharpe ratio and the Sortino ratio as
Then if and only if . For , is strictly monotonically increasing in .
Proof. We work in the skew-normal family, which is the simplest parametric family exhibiting continuously variable skewness. The density is
where is the standard normal density, is the standard normal CDF, and is the shape parameter controlling skewness. The skewness of this family is where , which is a strictly increasing odd function of . In particular, if and only if .
Step 1: Sharpe and Sortino as functions of . The mean and variance of the skew-normal are
The Sharpe ratio (with ) is . For the Sortino ratio, the downside deviation requires computing the second lower partial moment. In the skew-normal family,
which is a smooth function of (the integral converges because the skew-normal has Gaussian tails). Write for this function, so .
Step 2: The case. When , the skew-normal reduces to the Gaussian . The distribution is symmetric, so . By the Gaussian agreement established in the Divergence Theorem, the Sharpe and Sortino ratios are related by a constant factor: (since for a symmetric distribution with threshold at the mean). A constant rescaling does not change the direction of the gradient, so , giving .
Step 3: The case. For , the Sortino ratio's dependence on the downside deviation introduces a directional asymmetry in the gradient that the Sharpe ratio does not share. Specifically, compute the Euclidean partial derivatives:
The key difference is vs. . The total variance is an even function of (since replacing with reflects the distribution but preserves the variance), so . The downside partial moment , by contrast, is not symmetric in : positive (right skew) shifts mass away from the left tail, decreasing , while negative increases . Therefore whenever .
This asymmetry means the Euclidean gradient vectors and differ in their -components. Lifting to the Riemannian gradient via preserves the non-collinearity (since is positive definite and hence an isomorphism), so .
Step 4: Monotonicity in . As increases from 0, the difference grows monotonically (where is the proportionality constant from the relationship). This is verified by direct computation: is strictly positive at (the downside partial moment is a strictly convex function of the shape parameter around symmetry), so the deviation from proportionality grows at least linearly for small . For large , the skew-normal approaches a half-normal, and the ratio diverges from monotonically. Since is a strictly increasing function of (for each sign), the monotonicity transfers from to .
The theorem has a direct operational interpretation. When (symmetric returns), optimizing Sharpe and optimizing Sortino pull the trader in exactly the same direction through strategy space. There is no tension between the two objectives, and the choice of which ratio to report is purely cosmetic. But as soon as the return distribution develops skewness (as virtually all real-world strategies do), the two gradients diverge. Optimizing Sharpe nudges the strategy toward lower total volatility; optimizing Sortino nudges it toward lower downside volatility, even at the cost of higher total volatility. The further the returns deviate from symmetry, the more the two optimization objectives disagree.
5.3. Connection to KL divergence
The Fisher-Rao metric has a second, deeper characterization that connects it to information theory and, through that, to the Kelly criterion.
The Kullback-Leibler divergence between two return distributions in the strategy family is
This quantity measures how distinguishable the two strategies are from their return streams. It is non-negative, equals zero if and only if (for identifiable models), but is not symmetric and does not satisfy the triangle inequality.
The Fisher information matrix is the Hessian of the KL divergence at the identity:
Proof. Write as a function of the second argument, holding the first fixed at . Expanding the logarithm,
The first term is independent of , so all derivatives fall on the second term. Define . Then
At , the score identity gives (the expected score is zero under the true model), so the first derivative vanishes: . This confirms that is a critical point.
For the second derivative,
At , use the identity and the fact that (the density integrates to 1 for all ). Therefore
This is precisely the Fisher information matrix.
The proposition gives the Fisher-Rao metric a second reading: it measures the local curvature of the KL divergence landscape around each strategy. A direction in parameter space along which the KL divergence grows quickly is a direction along which the return distribution changes rapidly, and the Fisher-Rao metric assigns a large norm to that direction.
This connects to the Kelly criterion. Kelly-optimal betting, as discussed in an earlier post, maximizes the expected logarithmic growth rate, which is equivalent to maximizing the mutual information between the bettor's signal and the market outcome.[10] The mutual information is itself a KL divergence (between the joint distribution and the product of marginals). The Fisher-Rao metric therefore provides the infinitesimal geometry underlying both Kelly betting and the performance ratio landscape: both are governed by the curvature of information-theoretic divergences on the space of probability distributions.
5.4. Closing remark
Section 0 began with the observation that performance ratios are functionals on a chosen probability model. The intervening sections made this precise: the model determines the moments, partial moments, and tail functionals that each ratio consumes, and the relationships between ratios (agreement, divergence, scaling, ordering) are determined entirely by the distributional properties of the model.
The geometric perspective developed in this section makes the picture vivid. The space of strategies, equipped with the Fisher-Rao metric, is a curved manifold. Each performance ratio defines a scalar field on this manifold. The four novel theorems from Section 2, Section 3, and this section correspond to four distinct statements. The Omega Sufficiency Theorem establishes that the full distribution is recoverable from the Omega functional, grounding the entire taxonomy on a single master object. The remaining three are geometric:
The Divergence Theorem identifies when the level sets of different ratio fields separate: it happens precisely outside the Gaussian agreement region, where the distributional shape parameters (, ) become nonzero.
The Scaling Exponents Theorem identifies at which resolution the separation matters: each ratio field has a distinct scaling exponent under temporal aggregation, so the discrepancy between ratios changes with the observation frequency.
The Gradient Divergence Theorem identifies in which direction the separation matters: the Riemannian gradients of different ratio fields point in different directions, and the angle between them grows with distributional asymmetry.
Together, these three results frame a single conclusion. The choice of which performance ratio to optimize is a choice about which projection of the high-dimensional market state you trust most. The Sharpe ratio trusts the second moment. The Sortino ratio trusts the lower partial second moment. The Calmar ratio trusts the path-dependent maximum drawdown. The Omega ratio trusts the entire distribution (but in a one-dimensional compression via the threshold parameter). The Rachev ratio trusts the tails. No projection is universally dominant. The theorems make precise the conditions under which different projections agree, the conditions under which they must disagree, and the geometric consequences of that disagreement for strategy optimization.
References
- Rachev, S. T., Stoyanov, S. V. and Fabozzi, F. J. (2011). A Probability Metrics Approach to Financial Risk Measures. Wiley. DOI
- Sortino, F. A. and van der Meer, R. (1991). Downside risk. Journal of Portfolio Management, 17(4), 27-31. DOI
- Cherny, A. and Madan, D. (2009). New measures for performance evaluation. Review of Financial Studies, 22(7), 2571-2606. DOI
- Keating, C. and Shadwick, W. F. (2002). A universal performance measure. Journal of Performance Measurement, 6(3), 59-84.
- Sharpe, W. F. (1966). Mutual Fund Performance. Journal of Business, 39(1), 119-138. DOI
- Magdon-Ismail, M. and Atiya, A. F. (2004). Maximum drawdown. Risk, 17(10), 99-102.
- Kazemi, H., Schneeweis, T. and Gupta, B. (2004). Omega as a performance measure. Journal of Performance Measurement, 8(3), 16-25.
- Biagini, S. and Pinar, M. C. (2013). The best gain-loss ratio is a poor performance measure. SIAM Journal on Financial Mathematics, 4(1), 228-242. DOI
- Johnson, W. B. and Lindenstrauss, J. (1984). Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics, 26, 189-206. DOI
- Kelly, J. L. Jr. (1956). A new interpretation of information rate. Bell System Technical Journal, 35(4), 917-926. DOI
- Amari, S. (2016). Information Geometry and Its Applications. Springer. DOI