ASF

Price as Geometry: Resolution, Coarse-Graining, and the Structure of Market Noise

March 17, 2026|
A
A

The central question of quantitative finance is deceptively simple: does the price process have memory? The answer depends entirely on the timescale at which you ask it. At millisecond resolution, returns are mean-reverting. At hourly resolution, they behave approximately as geometric Brownian motion. No single model captures both regimes. The transition between them is a geometric fact about coarse-graining.

This observation is not specific to any one market. Lo and MacKinlay applied variance ratio tests to CRSP daily equity returns in 1988 and found systematic departures from the random walk null.[1] Cont's survey of equity stylized facts documented the same resolution-dependent pattern (short-horizon mean reversion, vanishing autocorrelation beyond a few minutes, fat tails) across dozens of equity markets.[21] Lo's 1991 study of stock market prices found Hurst exponents significantly below one-half at short horizons using the rescaled range statistic, directly analogous to the DFA results here.[22] What this post adds is a unified geometric framework that explains why these patterns must appear, together with high-resolution empirical measurements at millisecond precision that resolve the fine structure invisible to daily studies.

The organizing variable throughout is the temporal resolution δ\delta: the coarse-graining map that projects the microscopic trade-by-trade price process onto a sequence of binned log-returns. Every statistical property we examine turns out to be a function of δ\delta. Stationarity, volatility scaling, autocorrelation, spectral structure, and information flow between assets all change as δ\delta varies.

The empirical measurements use BTC and ETH tick data from Binance spot: 1.3 million BTC trades and 800,000 ETH trades per day, captured at microsecond timestamps, covering the full calendar year 2025. Crypto markets provide an unusually clean laboratory for this investigation: continuous 24-hour trading with no opening or closing auction artifacts, freely available microsecond tick data, and a well-studied two-asset pair. The qualitative findings are consistent with the equity microstructure literature throughout.

1. Philosophical Priors

No market ontology is assumed here. The price of any liquid asset is not "fundamentally" a diffusion, a jump process, a rough path, or any other stochastic object. It is the output of a deterministic system: a very large number of interacting agents, each choosing to submit, cancel, or execute orders according to their own information and objectives. The mid-price at any moment is the projection of that high-dimensional deterministic system onto a single observable scalar.

The apparent stochasticity of prices is an artifact of observation, not of the underlying system. When we bin trades into millisecond buckets and compute log-returns, we are performing a coarse-graining operation. The resulting time series inherits statistical properties that depend on the coarse-graining timescale. At fine resolution, the price at time t+δt + \delta reflects many individual order decisions, each with some structure; at coarse resolution, it averages over so many decisions that the central limit theorem pushes the residual toward Gaussian noise with independent increments.

This prior has a precise mathematical form, developed in Section 3. For now, the key point is that asking "is the market stationary?" or "does price follow a random walk?" without specifying the resolution is not a well-posed question. We will always answer relative to a specific δ\delta.

The philosophical framing carries an operational consequence: any trading strategy, risk model, or derivative pricing formula is implicitly a claim about which stochastic model is closer to true at the resolution at which the strategy operates. A market-making strategy that operates at 10 ms is making a different claim from a momentum strategy that operates at one hour. This post measures, rather than assumes, which claims are empirically defensible and at which resolutions.

We return to this framing at the end of Section 8, where the information-geometric perspective makes it especially concrete: the Fisher-Rao metric on the family of price return distributions is itself a function of δ\delta, and the market's information geometry genuinely changes as you coarsen your clock.

2. On-Ramp: The Same Question, Plainly Stated

This section introduces the key objects without set-builder notation. Readers already comfortable with log-returns and the variance ratio test may skip to Section 3.

The mid-price. The price of a traded asset on a limit order book is not a single number. At any moment, there is a best ask paskp^{\mathrm{ask}} (the cheapest price at which someone is willing to sell) and a best bid pbidp^{\mathrm{bid}} (the highest price at which someone is willing to buy). The mid-price is their average,

m  =  pask+pbid2.m \;=\; \frac{p^{\mathrm{ask}} + p^{\mathrm{bid}}}{2}.

We use the mid-price rather than the last trade price because it filters out the alternating "uptick/downtick" pattern caused by trades bouncing between the bid and ask, a microstructure artifact that would dominate any short-timescale analysis of the last-trade series.

The order book. The spread s=paskpbids = p^{\mathrm{ask}} - p^{\mathrm{bid}} is the cost of an immediate round-trip: buy at the ask, sell at the bid. Depth refers to the quantity available at each price level. A market order that exceeds the depth at the best bid or ask walks the book, moving the mid-price. The interplay between depth and order flow is what creates price impact, the tendency of large orders to move prices against the trader who placed them.

The log-return. At resolution δ\delta, the log-return at time tt is

rt  =  logmt+δlogmt.r_t \;=\; \log m_{t+\delta} - \log m_t.

We use log-returns rather than percentage returns for two reasons. First, they are additive over time: the log-return over two periods is the sum of the two single-period log-returns, which makes multi-scale analysis algebraically clean. Second, they are approximately symmetric around zero for small moves, whereas percentage returns have an upward bias for large positive moves. At millisecond timescales the distinction barely matters, but at longer horizons it is material.

Volatility and the δ\sqrt{\delta} prediction. The realized volatility at resolution δ\delta over a window of nn periods is

σδ  =  std(rt).\sigma_\delta \;=\; \mathrm{std}(r_t).

If returns were independent and identically distributed, the volatility at resolution kδk\delta would satisfy σkδ=kσδ\sigma_{k\delta} = \sqrt{k}\,\sigma_\delta. This is the square-root-of-time scaling, the signature of a process with no memory. It is the null hypothesis. We will test it directly in Section 4 using the variance ratio statistic.

Stationarity. A time series is stationary if its statistical properties do not change over time. A river in steady flow is stationary: the distribution of water heights at noon today is the same as at noon a year ago. A river during a flood is not. For price processes, stationarity means that the distribution of log-returns at any given resolution does not drift as time passes. Whether BTC log-returns are stationary is a resolution-dependent question: they may be stationary at a one-hour timescale while exhibiting systematic trends at a five-year timescale.

The random walk. The canonical null hypothesis in finance is that the price follows a random walk: tomorrow's price is today's price plus an unpredictable shock drawn independently from some fixed distribution. Under the random walk null, no strategy that looks only at past prices can forecast future returns. The variance ratio equals one at every timescale, the autocorrelation of returns is zero at every lag, and the Hurst exponent is exactly one-half.

Why this matters. Every trading strategy, risk model, and option pricing formula is an implicit bet on one of two things: either the random walk null holds (at the relevant timescale), or it fails in a specific exploitable direction. Market-making strategies assume short-timescale mean reversion and would lose money if the null held exactly. Trend-following strategies assume persistent autocorrelation and would lose money if the null held exactly. The goal of this post is to establish, empirically and geometrically, which assumptions are warranted at which resolutions.

3. The Coarse-Graining Map

We now develop the precise mathematical framework. All of what follows rests on one idea: the temporal resolution δ\delta is a fundamental parameter that changes the statistical model at every level.

Let (Ω,F,P)(\Omega, \mathcal{F}, P) be a probability space supporting the microscopic price process (St)t0(S_t)_{t \ge 0}, where StS_t is the mid-price at time tt.

Definition (Filtration and coarsened filtration).

The natural filtration of the price process is the family Ft=σ(Su:ut)\mathcal{F}_t = \sigma(S_u : u \le t), encoding all information up to time tt. For a resolution parameter δ>0\delta > 0, the δ\delta-coarsened filtration is

Ft(δ)  =  Ft/δδ,\mathcal{F}^{(\delta)}_t \;=\; \mathcal{F}_{\lfloor t/\delta \rfloor \delta},

the information available at the most recent grid point at or before tt.

Definition (Coarse-graining projection).

The coarse-graining projection at resolution δ\delta is the conditional expectation operator

Πδ:L2(Ω,Ft)    L2(Ω,Ft(δ)),Πδ[X]=E[XFt(δ)].\Pi_\delta : L^2(\Omega, \mathcal{F}_t) \;\to\; L^2(\Omega, \mathcal{F}^{(\delta)}_t), \qquad \Pi_\delta[X] = \mathbb{E}[X \mid \mathcal{F}^{(\delta)}_t].

By the tower property, Πδ1Πδ2=Πδ1\Pi_{\delta_1} \circ \Pi_{\delta_2} = \Pi_{\delta_1} whenever δ1δ2\delta_1 \ge \delta_2, so coarser projections dominate finer ones.

Definition (Resolution manifold).

The resolution manifold of the price process is the one-parameter family

R  =  {(Ω,F(δ),Pδ)}δ>0,\mathcal{R} \;=\; \bigl\{(\Omega,\, \mathcal{F}^{(\delta)},\, P^\delta)\bigr\}_{\delta > 0},

where PδP^\delta is the law of the coarse-grained process (St(δ))t0(S^{(\delta)}_t)_{t \ge 0} with St(δ)=St/δδS^{(\delta)}_t = S_{\lfloor t/\delta\rfloor\delta}. The resolution manifold is fibered over the positive real line (δ>0)(\delta > 0).

The question "is the market stationary?" is the question whether PδP^\delta varies with calendar time for a fixed δ\delta. A conceptually distinct question is whether the family {Pδ}δ>0\{P^\delta\}_{\delta > 0} is constant in δ\delta. If it is, the process has scale-invariant statistics: the random walk null and geometric Brownian motion both satisfy this. If it is not, the process has resolution-dependent structure.

Theorem (Markov property under coarse-graining).

The coarse-grained process (St(δ))(S^{(\delta)}_t) is Markov if and only if the transition semigroup (Pt)(P_t) of (St)(S_t) commutes with Πδ\Pi_\delta:

PsΠδ  =  ΠδPsfor all s>0.P_s \circ \Pi_\delta \;=\; \Pi_\delta \circ P_s \quad \text{for all } s > 0.

Proof. We must show that the two conditions are equivalent: (i) S(δ)S^{(\delta)} is Markov, and (ii) PsΠδ=ΠδPsP_s \circ \Pi_\delta = \Pi_\delta \circ P_s for all s>0s > 0.

Recall that St(δ)=St/δδS^{(\delta)}_t = S_{\lfloor t/\delta \rfloor \delta}, so the process only changes at the grid points {nδ:nN}\{n\delta : n \in \mathbb{N}\}. The kk-step transition density of S(δ)S^{(\delta)} is obtained by integrating out the continuous path between grid points: by the Chapman-Kolmogorov identity applied to the semigroup (Pt)(P_t),

pk(δ)(x,y)  =  pkδ(x,y),p^{(\delta)}_k(x, y) \;=\; p_{k\delta}(x, y),

where pt(x,y)p_t(x,y) is the transition density of the original process SS. This holds because E[f(Skδ)S0=x]=Pkδf(x)\mathbb{E}[f(S_{k\delta}) \mid S_0 = x] = P_{k\delta} f(x) by the semigroup property, and evaluating at the grid point eliminates all intermediate-time information.

The Markov property for S(δ)S^{(\delta)} states that for all bounded measurable ff and all grid times mδ,nδm\delta, n\delta with n>mn > m,

E ⁣[f(Snδ(δ))Fmδ(δ)]  =  E ⁣[f(Snδ(δ))Smδ(δ)].\mathbb{E}\!\left[f(S^{(\delta)}_{n\delta}) \,\big|\, \mathcal{F}^{(\delta)}_{m\delta}\right] \;=\; \mathbb{E}\!\left[f(S^{(\delta)}_{n\delta}) \,\big|\, S^{(\delta)}_{m\delta}\right].

The left side equals Πδ[P(nm)δf](Smδ)\Pi_\delta[P_{(n-m)\delta} f](S_{m\delta}) (condition on the coarsened filtration, then apply the semigroup). The right side equals P(nm)δ[Πδf](Smδ)P_{(n-m)\delta}[\Pi_\delta f](S_{m\delta}) (apply the semigroup first, then condition on the current grid value). These are equal for all f,m,nf, m, n if and only if the two operators commute:

ΠδPs  =  PsΠδfor all s=(nm)δ>0.\Pi_\delta \circ P_s \;=\; P_s \circ \Pi_\delta \quad \text{for all } s = (n-m)\delta > 0.

The converse is immediate: if the operators commute then both sides of the conditional expectation identity agree, establishing the Markov property. \square

Corollary (GBM is closed under coarse-graining).

Geometric Brownian motion is closed under Πδ\Pi_\delta: the coarse-grained log-return process has i.i.d. increments with variance scaling as σ2δ\sigma^2 \delta, so Var(rt(δ))=σ2δ\mathrm{Var}(r^{(\delta)}_t) = \sigma^2\delta and the variance ratio VR(k)=1VR(k) = 1 for all kk.

Proof. Under GBM, the log-price satisfies logSt=logS0+(μ12σ2)t+σWt\log S_t = \log S_0 + (\mu - \tfrac{1}{2}\sigma^2)t + \sigma W_t by Ito's formula. For any two times s<ts < t, the log-return over [s,t][s,t] is

r(ts)  =  logStlogSs  =  (μ12σ2)(ts)+σ(WtWs).r^{(t-s)} \;=\; \log S_t - \log S_s \;=\; \bigl(\mu - \tfrac{1}{2}\sigma^2\bigr)(t-s) + \sigma(W_t - W_s).

Since WtWsN(0,ts)W_t - W_s \sim \mathcal{N}(0, t-s) by the definition of Brownian motion, the δ\delta-increment has distribution

rt(δ)    N ⁣((μ12σ2)δ,  σ2δ),r^{(\delta)}_t \;\sim\; \mathcal{N}\!\left((\mu - \tfrac{1}{2}\sigma^2)\delta,\; \sigma^2\delta\right),

so Var(rt(δ))=σ2δ\mathrm{Var}(r^{(\delta)}_t) = \sigma^2\delta. Moreover, increments over non-overlapping intervals are independent: for [t1,t1+δ],[t2,t2+δ][t_1, t_1+\delta], [t_2, t_2+\delta] with t2t1+δt_2 \ge t_1 + \delta, the increments are driven by disjoint pieces of the Brownian path.

The kδk\delta-return is the sum of kk consecutive δ\delta-returns:

rt(kδ)  =  j=0k1rt+jδ(δ)  =  (μ12σ2)kδ+σ(Wt+kδWt).r^{(k\delta)}_t \;=\; \sum_{j=0}^{k-1} r^{(\delta)}_{t+j\delta} \;=\; \bigl(\mu - \tfrac{1}{2}\sigma^2\bigr)k\delta + \sigma\bigl(W_{t+k\delta} - W_t\bigr).

By independence and identical distribution of the summands,

Var ⁣(rt(kδ))  =  kVar ⁣(rt(δ))  =  kσ2δ.\mathrm{Var}\!\left(r^{(k\delta)}_t\right) \;=\; k\,\mathrm{Var}\!\left(r^{(\delta)}_t\right) \;=\; k\sigma^2\delta.

Substituting into the variance ratio definition,

VR(k)  =  Var(r(kδ))kVar(r(δ))  =  kσ2δkσ2δ  =  1.VR(k) \;=\; \frac{\mathrm{Var}(r^{(k\delta)})}{k\,\mathrm{Var}(r^{(\delta)})} \;=\; \frac{k\sigma^2\delta}{k \cdot \sigma^2\delta} \;=\; 1.

This holds for every k1k \ge 1 and every δ>0\delta > 0, confirming that GBM is closed under coarse-graining with exact unit variance ratio at every scale. \square

The corollary makes the connection between theory and empirics precise. We return to this framing in the conclusion.

4. The Random Walk Null

We now operationalize the null hypothesis and test it at nine resolutions, from 1 ms to 4 hr.

Definition (Geometric Brownian motion).

The geometric Brownian motion (GBM) model postulates

dS  =  μSdt+σSdWt,\mathrm{d}S \;=\; \mu S\,\mathrm{d}t + \sigma S\,\mathrm{d}W_t,

where WtW_t is a standard Brownian motion. Under GBM, the log-price Xt=logStX_t = \log S_t follows dXt=(μ12σ2)dt+σdWt\mathrm{d}X_t = (\mu - \frac{1}{2}\sigma^2)\,\mathrm{d}t + \sigma\,\mathrm{d}W_t, so log-returns at any resolution are i.i.d. Gaussian.

Definition (Variance ratio statistic).

Let Xt(1)X_t^{(1)} denote unit-period log-returns and Xt(k)=j=0k1Xtj(1)X_t^{(k)} = \sum_{j=0}^{k-1} X_{t-j}^{(1)} the kk-period return. The Lo-MacKinlay variance ratio at horizon kk is

VR(k)  =  Var ⁣(Xt(k))kVar ⁣(Xt(1)).VR(k) \;=\; \frac{\mathrm{Var}\!\left(X_t^{(k)}\right)}{k\,\mathrm{Var}\!\left(X_t^{(1)}\right)}.

Under GBM, VR(k)=1VR(k) = 1 at every kk by the Corollary in Section 3.

Theorem (Asymptotic distribution of VR under the random walk null).

Under the null hypothesis that Xt(1)X_t^{(1)} are i.i.d. with finite fourth moment, and with nn observations,

n(VR(k)1)  d  N ⁣(0,  2(2k1)(k1)3k)as n.\sqrt{n}\bigl(VR(k) - 1\bigr) \;\xrightarrow{d}\; \mathcal{N}\!\left(0,\; \frac{2(2k-1)(k-1)}{3k}\right) \quad \text{as } n \to \infty.[1]

Proof sketch. The variance ratio can be written as

VR(k)1  =  Var(X(k))kVar(X(1))kVar(X(1))  =  1σ^12 ⁣(σ^k2σ^12),VR(k) - 1 \;=\; \frac{\mathrm{Var}(X^{(k)}) - k\,\mathrm{Var}(X^{(1)})}{k\,\mathrm{Var}(X^{(1)})} \;=\; \frac{1}{\hat{\sigma}^2_1}\!\left(\hat{\sigma}^2_k - \hat{\sigma}^2_1\right),

where σ^j2=1njt=jn(Xt(j))2\hat{\sigma}^2_j = \frac{1}{nj}\sum_{t=j}^{n} (X_t^{(j)})^2 (demeaned). Under the null Xt(1)X_t^{(1)} are i.i.d. with mean zero and variance σ2\sigma^2, so the numerator becomes a linear combination of sample second moments at overlapping horizons. Expanding the kk-period variance estimator,

σ^k2  =  σ^12+2kj=1k1(kj)γ^(j),\hat{\sigma}^2_k \;=\; \hat{\sigma}^2_1 + \frac{2}{k}\sum_{j=1}^{k-1}(k-j)\,\hat{\gamma}(j),

where γ^(j)=1ntXt(1)Xt+j(1)\hat{\gamma}(j) = \frac{1}{n}\sum_t X_t^{(1)} X_{t+j}^{(1)} are the sample autocovariances. Under the null, γ^(j)0\hat{\gamma}(j) \to 0 for all j1j \ge 1, and by the CLT for sample autocovariances of i.i.d. sequences, nγ^(j)dN(0,σ4)\sqrt{n}\,\hat{\gamma}(j) \xrightarrow{d} \mathcal{N}(0, \sigma^4) jointly for j=1,,k1j = 1, \ldots, k-1, with uncorrelated components. Applying the delta method and summing the covariance contributions from the (kj)2(k-j)^2 weighting in the expansion yields variance

Var ⁣(n(VR(k)1))    4σ4σ41k2j=1k1(kj)2  =  4k2k(k1)(2k1)6  =  2(2k1)(k1)3k.\mathrm{Var}\!\left(\sqrt{n}\,(VR(k)-1)\right) \;\to\; \frac{4}{\sigma^4}\cdot\sigma^4\cdot\frac{1}{k^2}\sum_{j=1}^{k-1}(k-j)^2 \;=\; \frac{4}{k^2}\cdot\frac{k(k-1)(2k-1)}{6} \;=\; \frac{2(2k-1)(k-1)}{3k}.

The sum identity j=1k1(kj)2=k(k1)(2k1)/6\sum_{j=1}^{k-1}(k-j)^2 = k(k-1)(2k-1)/6 is the standard sum-of-squares formula. The Lindeberg CLT applied to the triangular array of summands completes the argument, giving the stated Gaussian limiting distribution.[1] \square

Definition (Hurst exponent).

The Hurst exponent HH characterizes long-range dependence via the variance scaling

Var ⁣(Xt(δ))    δ2Has δ.\mathrm{Var}\!\left(X_t^{(\delta)}\right) \;\sim\; \delta^{2H} \quad \text{as } \delta \to \infty.

The three regimes are: H=12H = \frac{1}{2} (GBM, uncorrelated increments), H<12H < \frac{1}{2} (mean-reverting, negatively correlated increments), H>12H > \frac{1}{2} (trending, positively correlated increments).

Theorem (DFA consistency).

The detrended fluctuation analysis (DFA) estimator H^DFA\hat{H}_{\mathrm{DFA}} of the Hurst exponent is consistent and asymptotically normal under weak dependence conditions.[2]

Proof sketch. DFA proceeds as follows. Given observations X1,,XNX_1, \ldots, X_N, form the profile Yk=i=1k(XiXˉ)Y_k = \sum_{i=1}^k (X_i - \bar{X}). Partition {1,,N}\{1,\ldots,N\} into N/s\lfloor N/s \rfloor non-overlapping windows of size ss. In each window, fit a local polynomial trend and compute the root-mean-square residual F(s)F(s). The DFA scaling hypothesis is

F(s)    CsHas s,F(s) \;\sim\; C \cdot s^{H} \quad \text{as } s \to \infty,

so HH is estimated as the slope of the log-log regression of F(s)F(s) against ss.

The consistency argument has two steps. First, one shows that F(s)2F(s)^2 concentrates around its expectation E[F(s)2]\mathbb{E}[F(s)^2] at rate 1/N/s1/\sqrt{N/s} by an LLN argument applied to the squared detrended residuals, using weak dependence (short memory of squared increments). Second, one identifies E[F(s)2]C2s2H\mathbb{E}[F(s)^2] \sim C^2 s^{2H} via the structure of the covariance function of the profile process: for a long-memory process with covariance γ(k)L(k)k2H2\gamma(k) \sim L(k)|k|^{2H-2}, the variance of the detrended profile over a window of size ss grows as s2Hs^{2H} by the Karamata theorem for regularly varying functions. The log-log slope estimator is then consistent by the delta method applied to the OLS regression of logF(s)\log F(s) on logs\log s across the window sizes s1<<sms_1 < \cdots < s_m, as NN \to \infty with sm=o(N)s_m = o(N).[2] \square

Empirical results. The figures below show the variance ratio curve and DFA log-log plot for BTC and ETH over the full year 2025, at scales 1 ms to 4 hr. The interactive component lets you explore the VR curve and Hurst estimate at each scale.

Variance ratio curve for BTC and ETH at nine timescales
Variance ratio VR(k)VR(k) for BTC (orange) and ETH (blue) at resolutions 1 ms to 4 hr, with 95% confidence bands. The dashed line marks VR=1VR = 1 (the GBM null). Both assets reject the null strongly at short timescales, with VR<1VR < 1 indicating mean reversion. The null cannot be rejected at hourly scales.
DFA log-log plot for BTC showing Hurst exponent
DFA log-log plot for BTC. The slope of the fitted line is the estimated Hurst exponent H^\hat{H}. Values below 12\frac{1}{2} indicate mean reversion; values above 12\frac{1}{2} indicate trending behavior.

Interactive: drag the slider to explore the variance ratio and Hurst estimate at each resolution.

Key finding. The Hurst exponent is below 12\frac{1}{2} at sub-second scales for both assets, confirming mean reversion at millisecond resolution. At hourly scales, H^12\hat{H} \approx \frac{1}{2}, consistent with GBM. The random walk null fails at the scales where high-frequency strategies operate. This pattern agrees with Lo's equity findings[22] and with the cross-market stylized facts surveyed by Cont.[21]

Equity comparison. The VR profile shape, below 1 at sub-minute horizons and converging to 1 at hourly horizons, is a universal feature of liquid equity markets. Lo and MacKinlay documented it across 2000 CRSP-listed stocks including small-cap portfolios where mean reversion is most pronounced.[1] Hasbrouck's detailed analysis of NYSE blue chips including IBM, GE, and Merck found intraday VR profiles qualitatively identical to those measured here: the random walk null is rejected below the 5-minute horizon for every major name examined, but cannot be rejected at 1-hour resolution.[24] The crossover scale from mean-reverting to diffusive differs by asset (roughly 2-10 minutes for large-cap equities versus 10-60 seconds for BTC/ETH), reflecting the different market structures and participant compositions, but the qualitative shape of the VR curve is the same.

5. Stationary Alternatives: Mean Reversion as Geometry

The empirical rejection of the random walk null at short timescales invites the question: which model fits better? We survey the main stationary alternatives, framed geometrically.

Definition (Ornstein-Uhlenbeck process).

The Ornstein-Uhlenbeck (OU) process is the solution to

dXt  =  κ(Xtμ)dt+σdWt,κ>0.\mathrm{d}X_t \;=\; -\kappa(X_t - \mu)\,\mathrm{d}t + \sigma\,\mathrm{d}W_t, \qquad \kappa > 0.

The drift term κ(Xtμ)-\kappa(X_t - \mu) is the restoring force: it pulls the process back toward the long-run mean μ\mu with speed κ\kappa. The stationary distribution is N(μ,σ2/(2κ))\mathcal{N}(\mu, \sigma^2/(2\kappa)).

Theorem (Unique stationary Gaussian Markov process).

The OU process is, up to affine transformation, the unique continuous-time stationary Gaussian Markov process.[4]

Proof sketch. We characterize all continuous-time stationary Gaussian Markov processes. A process (Xt)t0(X_t)_{t \ge 0} that is simultaneously Gaussian and Markov is determined by its mean function m(t)=E[Xt]m(t) = \mathbb{E}[X_t] and covariance function K(s,t)=Cov(Xs,Xt)K(s,t) = \mathrm{Cov}(X_s, X_t).

Stationarity requires m(t)=μm(t) = \mu (constant) and K(s,t)=γ(ts)K(s,t) = \gamma(|t-s|) for some function γ\gamma.

Markov property for a Gaussian process is equivalent to requiring that the partial correlation between XsX_s and XtX_t (for s<u<ts < u < t) vanishes when conditioning on XuX_u. For a Gaussian process this means

Cov(Xs,XtXu)  =  0for all s<u<t.\mathrm{Cov}(X_s, X_t \mid X_u) \;=\; 0 \quad \text{for all } s < u < t.

Using the Gaussian conditional covariance formula and the stationarity condition K(s,t)=γ(ts)K(s,t) = \gamma(|t-s|), this becomes the functional equation

γ(ts)  =  γ(tu)γ(us)γ(0)for all 0s<u<t.\gamma(t-s) \;=\; \frac{\gamma(t-u)\,\gamma(u-s)}{\gamma(0)} \quad \text{for all } 0 \le s < u < t.

Setting h=tuh = t - u, l=usl = u - s and c=1/γ(0)c = 1/\gamma(0), the equation reads γ(h+l)=cγ(h)γ(l)\gamma(h+l) = c\,\gamma(h)\gamma(l), which is Cauchy's multiplicative functional equation on (0,)(0,\infty). Under mild regularity (measurability or monotonicity, both satisfied by any covariance function), the only solutions are γ(τ)=γ(0)eκτ\gamma(\tau) = \gamma(0)\,e^{-\kappa\tau} for some κ0\kappa \ge 0.

The exponential covariance γ(τ)=σ22κeκτ\gamma(\tau) = \frac{\sigma^2}{2\kappa}e^{-\kappa\tau} corresponds exactly to the stationary distribution N(μ,σ2/(2κ))\mathcal{N}(\mu, \sigma^2/(2\kappa)) of the OU process dXt=κ(Xtμ)dt+σdWt\mathrm{d}X_t = -\kappa(X_t - \mu)\,\mathrm{d}t + \sigma\,\mathrm{d}W_t. Brownian motion (κ=0\kappa = 0) is non-stationary and excluded. Thus, up to the affine reparametrization (Xμ)/γ(0)(X - \mu)/\sqrt{\gamma(0)}, the OU process is the unique member of this class.[4] \square

The geometric interpretation reveals why mean reversion is a universal feature at short timescales. The key is the flat geometry of the OU process versus the potentially curved geometry of more general diffusions.

Definition (Black-Scholes as flat geometry).

Under the change of variables x=logSx = \log S, the Black-Scholes PDE

Vt+12σ2S22VS2+rSVSrV  =  0\frac{\partial V}{\partial t} + \frac{1}{2}\sigma^2 S^2 \frac{\partial^2 V}{\partial S^2} + rS\frac{\partial V}{\partial S} - rV \;=\; 0

reduces to the heat equation tu=12σ2xxu\partial_t u = \frac{1}{2}\sigma^2 \partial_{xx} u on (R,dx2)(\mathbb{R}, \mathrm{d}x^2). The Black-Scholes formula is the Gaussian heat kernel on flat R\mathbb{R}: zero curvature, vanishing Christoffel symbols, no Ito correction.[5]

Definition (Curved Riemannian diffusion).

A diffusion on a Riemannian manifold (M,g)(M, g) obeys the Stratonovich SDE

dXti  =  12gijjVdt+σdWti,\mathrm{d}X^i_t \;=\; -\frac{1}{2}g^{ij}\partial_j V\,\mathrm{d}t + \sigma\,\mathrm{d}W^i_t,

with the Ito form acquiring a correction term from the Christoffel symbols Γjki\Gamma^i_{jk}:

Ito correction: σ22Γjkigjk.\text{Ito correction: } \quad \frac{\sigma^2}{2}\,\Gamma^i_{jk}\,g^{jk}.

The correction is zero if and only if (M,g)(M, g) is flat.

Corollary (Curvature generates systematic drift).

The Ito correction from non-zero curvature generates a systematic drift that a flat-geometry model misattributes to the mean-reversion parameter κ\kappa. Fitting an OU process to data generated by a curved diffusion will overestimate κ\kappa by an amount proportional to the Ricci curvature of (M,g)(M, g) at the current point.

One model in the table warrants a preliminary definition. Fractional Brownian motion (fBm) with Hurst parameter H(0,1)H \in (0,1) is the unique (up to scaling) continuous Gaussian process with stationary increments satisfying Var(BtHBsH)=ts2H\mathrm{Var}(B^H_t - B^H_s) = |t - s|^{2H}. Standard Brownian motion is the special case H=12H = \frac{1}{2}, where increments are independent. For H12H \neq \frac{1}{2} the increments are correlated across all lags: positively for H>12H > \frac{1}{2} (long-range dependence) and negatively for H<12H < \frac{1}{2} (anti-persistence). The covariance matrix of fBm increments (ΔB0H,ΔB1H,,ΔBn1H)(\Delta B^H_0, \Delta B^H_1, \ldots, \Delta B^H_{n-1}) is Toeplitz with entries γH(k)=12(k+12H2k2H+k12H)\gamma_H(k) = \frac{1}{2}\bigl(|k+1|^{2H} - 2|k|^{2H} + |k-1|^{2H}\bigr); for n=3n = 3 this is:

ΓH  =  σ2(1γH(1)γH(2)γH(1)1γH(1)γH(2)γH(1)1).\Gamma_H \;=\; \sigma^2 \begin{pmatrix} 1 & \gamma_H(1) & \gamma_H(2) \\ \gamma_H(1) & 1 & \gamma_H(1) \\ \gamma_H(2) & \gamma_H(1) & 1 \end{pmatrix}.

At H=12H = \frac{1}{2} we have γH(k)=0\gamma_H(k) = 0 for all k1k \ge 1, recovering the identity covariance of standard Brownian increments. At H=0.1H = 0.1 the off-diagonal entries are negative, encoding the anti-persistent structure of rough volatility. The rough volatility model of Gatheral, Jaisson, and Rosenbaum calibrated the log-volatility process of equity and index options and found H0.1H \approx 0.1, far below the Brownian value.[7] At H=0.1H = 0.1 the sample paths of fBm are far rougher than Brownian motion: the classical Ito calculus does not apply, and a theory of integration against such paths requires the rough paths framework.[8]

The table below summarizes the geometric interpretation of the standard models.

ModelGeometric interpretation
GBM / Black-ScholesFlat R\mathbb{R}, BS formula = Gaussian heat kernel, VR(k)=1VR(k) = 1
Avellaneda-StoikovFlat R\mathbb{R} mid-price; affine inventory correction; symmetric optimal quotes
OU / VasicekFlat R\mathbb{R}, quadratic potential V=κ2(xμ)2V = \frac{\kappa}{2}(x-\mu)^2, Gaussian stationary measure
HestonState space R×R>0\mathbb{R} \times \mathbb{R}_{>0} with metric from the vol process; Ito correction from Christoffel symbols modifies BS drift
Rough volatilityLog-vol is fBm with H0.1H \approx 0.1; path space requires rough paths calculus
General curved diffusion(M,g)(M, g) arbitrary; Ito correction from Γjki\Gamma^i_{jk} gives observable drift signature

The Avellaneda-Stoikov market-making model[6] is worth examining in this context. It assumes a flat GBM mid-price and derives the optimal bid and ask quotes by solving a Hamilton-Jacobi-Bellman equation. The reservation price is

r  =  Sqγσ2(Tt),r \;=\; S - q\gamma\sigma^2(T-t),

a linear (affine) correction for inventory qq with risk aversion γ\gamma. The optimal spread is

δ  =  γσ2(Tt)+2γln ⁣(1+γκ).\delta^* \;=\; \gamma\sigma^2(T-t) + \frac{2}{\gamma}\ln\!\left(1 + \frac{\gamma}{\kappa}\right).

The linearity of the inventory correction and the symmetry of the optimal quotes are both consequences of the flat geometry. On a curved state space, the Hamilton-Jacobi-Bellman equation acquires curvature corrections, the inventory correction becomes nonlinear, and the optimal bid and ask are no longer symmetric around the mid-price. Whether these corrections are material at millisecond scales is an open question.

6. Autocorrelation Structure Across Scales

The variance ratio test detects memory in aggregate. The autocorrelation function resolves it by lag.

Definition (Autocorrelation function).

For a wide-sense stationary process XtX_t, the autocorrelation function at lag τ\tau is

ρX(τ)  =  Cov(Xt,Xt+τ)Var(Xt)  =  Xt,Xt+τL2XtL22,\rho_X(\tau) \;=\; \frac{\mathrm{Cov}(X_t, X_{t+\tau})}{\mathrm{Var}(X_t)} \;=\; \frac{\langle X_t, X_{t+\tau}\rangle_{L^2}}{\|X_t\|^2_{L^2}},

the normalized inner product on L2(Ω,P)L^2(\Omega, P).

Theorem (Wiener-Khinchin theorem).

For a wide-sense stationary process XtX_t with absolutely summable autocorrelations, the power spectral density SX(ω)S_X(\omega) and the autocorrelation function form a Fourier pair:

ρX(τ)  =  eiωτSX(ω)dω,SX(ω)  =  eiωτρX(τ)dτ.\rho_X(\tau) \;=\; \int_{-\infty}^{\infty} e^{i\omega\tau} S_X(\omega)\,\mathrm{d}\omega, \qquad S_X(\omega) \;=\; \int_{-\infty}^{\infty} e^{-i\omega\tau} \rho_X(\tau)\,\mathrm{d}\tau.[14][15]
Definition (Short and long memory).

A stationary process has short memory if τ=0ρX(τ)<\sum_{\tau=0}^\infty |\rho_X(\tau)| < \infty and long memory if the sum diverges. Fractional Brownian motion with Hurst exponent H>12H > \frac{1}{2} is the canonical long-memory model.

Theorem (Spectral characterization of long memory).

A stationary process has Hurst exponent HH if and only if its power spectral density satisfies

SX(ω)    ω(2H1)as ω0.S_X(\omega) \;\sim\; |\omega|^{-(2H-1)} \quad \text{as } \omega \to 0.[3]

Empirical results. We compute the signed-volume autocorrelation at sub-second lags (1 ms to 1 s) and the log-return autocorrelation at super-second lags (1 s to 1 hr) for both assets. The heatmap below shows both dimensions simultaneously.

Signed-volume autocorrelation at 1ms resolution for BTC and ETH
Signed-volume autocorrelation at 1 ms resolution. The strong negative autocorrelation at lag 1 ms (BTC: approximately 0.05-0.05, ETH: approximately 0.04-0.04) is consistent with the lead-lag reversal finding from the companion post: high signed volume in one direction at 1 ms is systematically followed by a reversal. This is the microstructure signature of mean reversion.
Autocorrelation heatmap across lag and resolution for BTC and ETH
Autocorrelation heatmap: xx-axis is lag, yy-axis is resolution. Red indicates positive autocorrelation; blue indicates negative. The negative autocorrelation at short lags and fine resolutions is visible across both assets. At coarser resolutions the autocorrelation fades toward zero, consistent with the approach to GBM behavior identified by the variance ratio test.

Interactive: brush a region to zoom in and see the mean autocorrelation and 95% confidence band.

Key finding. The signed-volume autocorrelation at lag 1 ms is negative for both assets, with the lag-1 return autocorrelation at δt=1s\delta t = 1\,\mathrm{s} agreeing with the PolyBackTest finding of 0.048-0.048 at p<0.001p < 0.001. The negative autocorrelation decays rapidly: by lag 10 ms it is near zero, and by 1 s it is indistinguishable from zero at the 95% level. Short-horizon negative autocorrelation followed by rapid decay to zero is a universal stylized fact across equity markets.[21]

Equity comparison. Roll derived the mechanism from first principles: in any two-sided market with a positive bid-ask spread ss, the serial covariance of transaction-price changes satisfies Cov(Δpt,Δpt1)=s2/4\mathrm{Cov}(\Delta p_t, \Delta p_{t-1}) = -s^2/4, generating a mechanical negative lag-1 autocorrelation regardless of the underlying fundamental process.[23] For NYSE-listed equities including AAPL, GE, IBM, and MSFT, this bid-ask bounce accounts for the majority of the negative serial correlation at tick-by-tick resolution, with the residual attributable to order-book replenishment dynamics analogous to the microstructure mean reversion measured here. Roll's formula implies that the lag-1 autocorrelation should be more negative for illiquid securities (wider spread) and less negative for highly liquid ones. BTC and ETH on Binance spot have spreads of roughly 0.01-0.02% at millisecond resolution, placing them in the same range as large-cap equities like AAPL and SPY, and the observed autocorrelation magnitudes are consistent with this prediction.

7. Spectral Decomposition: Wavelets and Geometric Harmonics

The Wiener-Khinchin theorem decomposes variance by frequency. Wavelets refine this by isolating variance at specific timescales without losing time localization.

Definition (Multiresolution analysis).

A multiresolution analysis (MRA) of L2(R)L^2(\mathbb{R}) is a nested sequence of closed subspaces VjL2(R)V_j \subset L^2(\mathbb{R}) satisfying the Mallat axioms: completeness, closure under translation and dilation, and generation by a single scaling function ϕ\phi. The complementary subspaces Wj=Vj+1VjW_j = V_{j+1} \ominus V_j (where \ominus denotes the orthogonal complement: WjW_j is the subspace of Vj+1V_{j+1} orthogonal to VjV_j, so that Vj+1=VjWjV_{j+1} = V_j \oplus W_j) are generated by the mother wavelet ψ\psi.[9]

Theorem (Wavelet variance decomposition).

For a second-order stationary process XtX_t, the total variance decomposes over wavelet scales as

Var(X)  =  j=1νj2,\mathrm{Var}(X) \;=\; \sum_{j=1}^{\infty} \nu^2_j,

where νj2=Var(dj,)\nu^2_j = \mathrm{Var}(d_{j,\cdot}) is the wavelet variance at scale 2j2^j and dj,td_{j,t} is the jj-th level wavelet coefficient.[10]

Proof sketch. The MRA orthogonal decomposition L2(R)=j=1WjL^2(\mathbb{R}) = \bigoplus_{j=1}^{\infty} W_j (where Wj=Vj+1VjW_j = V_{j+1} \ominus V_j) gives an orthogonal direct sum. Any XL2(Ω)X \in L^2(\Omega) can be written as

X  =  j=1dj+limjΠVjX  =  j=1dj,X \;=\; \sum_{j=1}^{\infty} d_j + \lim_{j\to\infty} \Pi_{V_j} X \;=\; \sum_{j=1}^{\infty} d_j,

where the scaling function contribution vanishes in the L2L^2 limit because the approximation spaces VjV_j capture only coarser-and-coarser structure as jj \to -\infty (the completeness condition of the MRA). The orthogonality of the WjW_j subspaces gives

XL22  =  j=1djL22.\|X\|^2_{L^2} \;=\; \sum_{j=1}^{\infty} \|d_j\|^2_{L^2}.

For a stationary process, E[Xt]=μ\mathbb{E}[X_t] = \mu is constant, so Var(X)=XμL2(Ω)2\mathrm{Var}(X) = \|X - \mu\|^2_{L^2(\Omega)}. The wavelet filter ψj(t)=2j/2ψ(2jt)\psi_j(t) = 2^{-j/2}\psi(2^{-j}t) is a bandpass filter that extracts the variance in the octave band [2(j+1),2j][2^{-(j+1)}, 2^{-j}] of the power spectrum. The wavelet variance at scale 2j2^j is therefore

νj2  =  Var(dj,t)  =  2(j+1)2jSX(f)df  +  2j2(j+1)SX(f)df,\nu^2_j \;=\; \mathrm{Var}(d_{j,t}) \;=\; \int_{2^{-(j+1)}}^{2^{-j}} S_X(f)\,\mathrm{d}f \;+\; \int_{-2^{-j}}^{-2^{-(j+1)}} S_X(f)\,\mathrm{d}f,

where SXS_X is the power spectral density. Summing over all scales and using the Parseval-Wiener relation Var(X)=SX(f)df\mathrm{Var}(X) = \int_{-\infty}^{\infty} S_X(f)\,\mathrm{d}f together with the fact that the wavelet filter banks partition the frequency axis, one recovers Var(X)=j=1νj2\mathrm{Var}(X) = \sum_{j=1}^{\infty} \nu^2_j.[10] \square

The wavelet variance profile jνj2j \mapsto \nu^2_j is the time-localized analogue of the power spectrum: it shows which scales carry the variance, without the stationarity assumption required by the Fourier transform.

Definition (Graph Laplacian and diffusion map).

Given a point cloud {xi}i=1nRd\{x_i\}_{i=1}^n \subset \mathbb{R}^d, define the weight matrix Wij=exp(xixj2/ε)W_{ij} = \exp(-\|x_i - x_j\|^2/\varepsilon) and the graph Laplacian Lε=ID1WL_\varepsilon = I - D^{-1}W where DD is the diagonal degree matrix. As ε0,n\varepsilon \to 0, n \to \infty, LεL_\varepsilon converges to the Laplace-Beltrami operator ΔM\Delta_M on the underlying manifold MM. The eigenfunctions {ψk}\{\psi_k\} of LεL_\varepsilon are the geometric harmonics.[11]

Theorem (Diffusion distance).

Euclidean distance in diffusion map coordinates equals the diffusion distance:

Dt(x,y)  =  pt(x,)pt(y,)L2(M),D_t(x, y) \;=\; \bigl\|p_t(x, \cdot) - p_t(y, \cdot)\bigr\|_{L^2(M)},

integrating over all paths connecting xx to yy at diffusion time tt. Two points are close in diffusion distance if there are many short paths between them on the manifold.[11]

Proof sketch. Let W=D1/2(ID1Wε)D1/2W = D^{-1/2}(I - D^{-1}W_\varepsilon)D^{1/2} denote the symmetrized graph Laplacian (here temporarily overloading WεW_\varepsilon for the weight matrix). Its eigendecomposition gives orthonormal eigenvectors ϕk\phi_k with eigenvalues 1λ1λ201 \ge \lambda_1 \ge \lambda_2 \ge \cdots \ge 0. The diffusion map embeds each point xix_i into Rm\mathbb{R}^m via

Φt(xi)  =  (λ1tψ1(xi),  λ2tψ2(xi),  ,  λmtψm(xi)),\Phi_t(x_i) \;=\; \bigl(\lambda_1^t \psi_1(x_i),\; \lambda_2^t \psi_2(x_i),\; \ldots,\; \lambda_m^t \psi_m(x_i)\bigr),

where ψk=D1/2ϕk\psi_k = D^{-1/2}\phi_k are the (right) eigenvectors of the row-normalized Markov matrix M=D1WεM = D^{-1}W_\varepsilon. The squared Euclidean distance in this embedding is

Φt(x)Φt(y)2  =  k=1mλk2t(ψk(x)ψk(y))2.\|\Phi_t(x) - \Phi_t(y)\|^2 \;=\; \sum_{k=1}^m \lambda_k^{2t}\bigl(\psi_k(x) - \psi_k(y)\bigr)^2.

On the other hand, the tt-step Markov transition kernel satisfies pt(x,)=kλktψk(x)ψk()p_t(x, \cdot) = \sum_k \lambda_k^t \psi_k(x)\psi_k(\cdot) by the spectral expansion. Hence

pt(x,)pt(y,)L22  =  kλk2t(ψk(x)ψk(y))2  =  Φt(x)Φt(y)2.\|p_t(x,\cdot) - p_t(y,\cdot)\|^2_{L^2} \;=\; \sum_k \lambda_k^{2t}\bigl(\psi_k(x) - \psi_k(y)\bigr)^2 \;=\; \|\Phi_t(x) - \Phi_t(y)\|^2.

The equality shows that the Euclidean distance in the diffusion map coordinates is exactly the L2L^2 distance between the diffusion kernels pt(x,)p_t(x,\cdot) and pt(y,)p_t(y,\cdot). As ε0\varepsilon \to 0 and nn \to \infty, the Markov matrix MM converges to the heat semigroup of the Laplace-Beltrami operator on MM, and the diffusion distance converges to the geometric diffusion distance on the underlying manifold.[11] \square

The diffusion distance provides a notion of geometric proximity that is intrinsic to the data manifold, rather than imposed by a Euclidean embedding. For price data, it means that two market states are "similar" not because their log-return values happen to be close, but because the conditional distributions of future returns are close in L2L^2.

Empirical results. We apply a Daubechies-4 (D4) wavelet decomposition to BTC and ETH log-returns, computing the wavelet variance at each of eight decomposition levels in a 30-day rolling window stepped daily across 2025.

Wavelet variance decomposition for BTC and ETH
Wavelet variance by decomposition level for BTC (orange) and ETH (blue), averaged over the full year 2025. Higher levels correspond to coarser timescales. The variance is concentrated at finer scales for both assets, consistent with the mean-reversion evidence at short timescales and the diffuse noise at longer scales.

Interactive: watch how the wavelet variance profile evolves through 2025. Press play or drag the scrubber.

Key finding. The wavelet variance profile is not flat: the bulk of the variance sits at the finest scales (sub-100 ms), consistent with the mean-reversion and negative autocorrelation evidence. The profile shifts slightly toward coarser scales during periods of high market volatility (Q1 2025 BTC bull run), suggesting that trending behaviour temporarily transfers variance from fine scales to coarse ones.

Equity comparison. Andersen, Bollerslev, Diebold, and Ebens computed realized volatility for all 30 DJIA component stocks at 5-minute resolution and found the same qualitative wavelet structure: variance is concentrated at fine timescales under normal conditions, with spectral mass shifting toward daily and weekly scales during volatility episodes such as the 1998 LTCM crisis.[25] Percival and Walden's textbook treatment of wavelet variance uses S&P 500 return data as the primary example, demonstrating the bottom-heavy profile for large-cap equity indices as the canonical case.[10] The BTC/ETH profile here is consistent with what is observed for GE, MSFT, JPMorgan, and the DJIA aggregate, shifted to finer timescales by roughly one order of magnitude owing to the higher trade frequency on cryptocurrency exchanges.

8. Information Geometry and Cross-Asset Flow

The statistical models that describe the price process at different resolutions are themselves elements of a geometric space. This is the domain of information geometry.

Definition (Statistical manifold).

The statistical manifold of price return distributions at resolution δ\delta is the family

Sδ  =  {pθδ:θΘ},\mathcal{S}_\delta \;=\; \{p_\theta^\delta : \theta \in \Theta\},

where θ\theta parameterizes the distribution of the log-return rt(δ)r^{(\delta)}_t. As δ\delta varies, the manifold changes: at fine δ\delta it is parameterized by the mean-reversion parameters; at coarse δ\delta it is approximately a Gaussian family parameterized by (μ,σ)(\mu, \sigma).

Definition (Fisher-Rao metric).

The Fisher information matrix at θ=(θ1,,θn)\theta = (\theta^1, \ldots, \theta^n) is the n×nn \times n matrix whose (i,j)(i,j) entry is the expected product of the ii-th and jj-th score functions:

g(θ)  =  (E ⁣[(θ1 ⁣logpθ)2]E ⁣[θ1 ⁣logpθθ2 ⁣logpθ]E ⁣[θ1 ⁣logpθθn ⁣logpθ]E ⁣[θ2 ⁣logpθθ1 ⁣logpθ]E ⁣[(θ2 ⁣logpθ)2]E ⁣[θ2 ⁣logpθθn ⁣logpθ]E ⁣[θn ⁣logpθθ1 ⁣logpθ]E ⁣[θn ⁣logpθθ2 ⁣logpθ]E ⁣[(θn ⁣logpθ)2]),g(\theta) \;=\; \begin{pmatrix} \mathbb{E}\!\left[(\partial_{\theta^1}\!\log p_\theta)^2\right] & \mathbb{E}\!\left[\partial_{\theta^1}\!\log p_\theta \cdot \partial_{\theta^2}\!\log p_\theta\right] & \cdots & \mathbb{E}\!\left[\partial_{\theta^1}\!\log p_\theta \cdot \partial_{\theta^n}\!\log p_\theta\right] \\[6pt] \mathbb{E}\!\left[\partial_{\theta^2}\!\log p_\theta \cdot \partial_{\theta^1}\!\log p_\theta\right] & \mathbb{E}\!\left[(\partial_{\theta^2}\!\log p_\theta)^2\right] & \cdots & \mathbb{E}\!\left[\partial_{\theta^2}\!\log p_\theta \cdot \partial_{\theta^n}\!\log p_\theta\right] \\[6pt] \vdots & \vdots & \ddots & \vdots \\[6pt] \mathbb{E}\!\left[\partial_{\theta^n}\!\log p_\theta \cdot \partial_{\theta^1}\!\log p_\theta\right] & \mathbb{E}\!\left[\partial_{\theta^n}\!\log p_\theta \cdot \partial_{\theta^2}\!\log p_\theta\right] & \cdots & \mathbb{E}\!\left[(\partial_{\theta^n}\!\log p_\theta)^2\right] \end{pmatrix},

defining a Riemannian metric on Sδ\mathcal{S}_\delta called the Fisher-Rao metric. The matrix is symmetric and positive semi-definite. This is the unique (up to scale) monotone metric on the space of probability distributions.[12]

For the Gaussian family pθ=N(μ,σ2)p_\theta = \mathcal{N}(\mu, \sigma^2) with θ=(μ,σ2)\theta = (\mu, \sigma^2), the two score functions are μlogpθ=(xμ)/σ2\partial_\mu \log p_\theta = (x - \mu)/\sigma^2 and σ2logpθ=(xμ)2/(2σ4)1/(2σ2)\partial_{\sigma^2} \log p_\theta = (x-\mu)^2/(2\sigma^4) - 1/(2\sigma^2). Their expectations give a diagonal matrix:

g(θ)  =  (E ⁣[(Xμ)2σ4]E ⁣[(Xμ)σ2((Xμ)22σ412σ2)]E ⁣[(Xμ)σ2((Xμ)22σ412σ2)]E ⁣[((Xμ)22σ412σ2) ⁣2])  =  (1σ20012σ4).g(\theta) \;=\; \begin{pmatrix} \mathbb{E}\!\left[\dfrac{(X-\mu)^2}{\sigma^4}\right] & \mathbb{E}\!\left[\dfrac{(X-\mu)}{\sigma^2}\cdot\left(\dfrac{(X-\mu)^2}{2\sigma^4} - \dfrac{1}{2\sigma^2}\right)\right] \\[12pt] \mathbb{E}\!\left[\dfrac{(X-\mu)}{\sigma^2}\cdot\left(\dfrac{(X-\mu)^2}{2\sigma^4} - \dfrac{1}{2\sigma^2}\right)\right] & \mathbb{E}\!\left[\left(\dfrac{(X-\mu)^2}{2\sigma^4} - \dfrac{1}{2\sigma^2}\right)^{\!2}\right] \end{pmatrix} \;=\; \begin{pmatrix} \dfrac{1}{\sigma^2} & 0 \\[8pt] 0 & \dfrac{1}{2\sigma^4} \end{pmatrix}.

The diagonal structure means μ\mu and σ2\sigma^2 are orthogonal coordinates on Sδ\mathcal{S}_\delta: estimating the mean carries no Fisher information about the variance and vice versa. The geodesic distance in the Fisher-Rao metric on this family is the hyperbolic distance on the upper half-plane {(μ,σ):σ>0}\{(\mu, \sigma) : \sigma > 0\}.

Theorem (Cramer-Rao bound as geodesic constraint).

Any unbiased estimator θ^\hat{\theta} of θ\theta satisfies

Var(θ^i)    (g1)ii,\mathrm{Var}(\hat{\theta}^i) \;\ge\; (g^{-1})^{ii},

with equality achieved by the maximum likelihood estimator, which is the efficient estimator in the sense of [12, Theorem 2.1]. The Cramer-Rao bound is therefore a statement about the geometry of Sδ\mathcal{S}_\delta: no estimator can achieve precision greater than the inverse Fisher information, just as no path on a Riemannian manifold can be shorter than the geodesic distance.

Proof (scalar case). Let T=T(X)T = T(X) be an unbiased estimator of θ\theta, so Eθ[T]=θ\mathbb{E}_\theta[T] = \theta for all θ\theta. Let sθ=θlogpθs_\theta = \partial_\theta \log p_\theta denote the score function.

Step 1: Score has mean zero. Differentiating the identity pθ(x)dx=1\int p_\theta(x)\,\mathrm{d}x = 1 with respect to θ\theta under the integral sign (justified by dominated convergence under regularity conditions),

θpθ(x)dx  =  0        Eθ[sθ]  =  sθ(x)pθ(x)dx  =  0.\int \partial_\theta p_\theta(x)\,\mathrm{d}x \;=\; 0 \;\implies\; \mathbb{E}_\theta[s_\theta] \;=\; \int s_\theta(x)\, p_\theta(x)\,\mathrm{d}x \;=\; 0.

Step 2: Covariance identity. Differentiating Eθ[T]=θ\mathbb{E}_\theta[T] = \theta with respect to θ\theta,

1  =  ddθT(x)pθ(x)dx  =  T(x)θpθ(x)dx  =  Eθ[Tsθ].1 \;=\; \frac{\mathrm{d}}{\mathrm{d}\theta}\int T(x)\, p_\theta(x)\,\mathrm{d}x \;=\; \int T(x)\,\partial_\theta p_\theta(x)\,\mathrm{d}x \;=\; \mathbb{E}_\theta[T \cdot s_\theta].

Since E[sθ]=0\mathbb{E}[s_\theta] = 0, this gives Cov(T,sθ)=E[Tsθ]E[T]E[sθ]=1\mathrm{Cov}(T, s_\theta) = \mathbb{E}[T \cdot s_\theta] - \mathbb{E}[T]\mathbb{E}[s_\theta] = 1.

Step 3: Cauchy-Schwarz. By the Cauchy-Schwarz inequality for covariances,

(Cov(T,sθ))2    Var(T)Var(sθ)  =  Var(T)I(θ),\bigl(\mathrm{Cov}(T,\, s_\theta)\bigr)^2 \;\le\; \mathrm{Var}(T)\cdot\mathrm{Var}(s_\theta) \;=\; \mathrm{Var}(T)\cdot I(\theta),

where I(θ)=E[sθ2]I(\theta) = \mathbb{E}[s_\theta^2] is the Fisher information (using Step 1: Var(sθ)=E[sθ2]\mathrm{Var}(s_\theta) = \mathbb{E}[s_\theta^2]). Substituting Cov(T,sθ)=1\mathrm{Cov}(T, s_\theta) = 1 gives

1    Var(T)I(θ)        Var(T)    1I(θ)  =  (g1)θθ.1 \;\le\; \mathrm{Var}(T) \cdot I(\theta) \;\implies\; \mathrm{Var}(T) \;\ge\; \frac{1}{I(\theta)} \;=\; (g^{-1})^{\theta\theta}.

Equality. The Cauchy-Schwarz inequality is tight if and only if TθT - \theta is proportional to sθs_\theta almost surely. This means T=θ+c(θ)sθT = \theta + c(\theta)\,s_\theta for some function c(θ)c(\theta). For an exponential family pθ(x)=h(x)exp(η(θ)T(x)A(θ))p_\theta(x) = h(x)\exp(\eta(\theta)T(x) - A(\theta)), the sufficient statistic T(x)T(x) satisfies exactly this condition, so the MLE achieves the bound. \square

Definition (Transfer entropy).

The transfer entropy from process XX to process YY at resolution δ\delta is

TXY(δ)  =  H(Yt+δYt)H(Yt+δYt,Xt),T_{X \to Y}(\delta) \;=\; H(Y_{t+\delta} \mid Y_t) - H(Y_{t+\delta} \mid Y_t, X_t),

where H()H(\cdot \mid \cdot) denotes conditional Shannon entropy. It quantifies how much knowing the past of XX reduces uncertainty about the future of YY, beyond what is already known from YY's own past.

Theorem (Transfer entropy as KL divergence).

Transfer entropy equals the Kullback-Leibler divergence

TXY  =  DKL ⁣(p(Yt+δYt,Xt)p(Yt+δYt)).T_{X \to Y} \;=\; D_{\mathrm{KL}}\!\left(p(Y_{t+\delta} \mid Y_t, X_t) \,\|\, p(Y_{t+\delta} \mid Y_t)\right).

The Fisher-Rao metric is the unique Riemannian metric for which DKLD_{\mathrm{KL}} is the natural divergence to second order in the parameter displacement [12, Chapter 3]. Transfer entropy is therefore a natural, coordinate-free measure of information flow on the statistical manifold Sδ\mathcal{S}_\delta.

Proof. Denote Y=Yt+δY' = Y_{t+\delta}, Y=YtY = Y_t, X=XtX = X_t for brevity. Starting from the definition,

TXY  =  H(YY)H(YY,X).T_{X \to Y} \;=\; H(Y' \mid Y) - H(Y' \mid Y, X).

Expanding the conditional entropies using the definition H(YZ)=E[logp(YZ)]H(Y' \mid Z) = -\mathbb{E}[\log p(Y' \mid Z)],

TXY  =  E[logp(YY)]+E[logp(YY,X)]  =  E ⁣[logp(YY,X)p(YY)],T_{X \to Y} \;=\; -\mathbb{E}\bigl[\log p(Y' \mid Y)\bigr] + \mathbb{E}\bigl[\log p(Y' \mid Y, X)\bigr] \;=\; \mathbb{E}\!\left[\log \frac{p(Y' \mid Y, X)}{p(Y' \mid Y)}\right],

where the expectation is over the joint distribution p(Y,Y,X)p(Y', Y, X). Writing this expectation as a sum over all values,

TXY  =  y,y,xp(y,y,x)logp(yy,x)p(yy)  =  E(Y,X) ⁣[DKL ⁣(p(YY,X)p(YY))],T_{X \to Y} \;=\; \sum_{y', y, x} p(y', y, x)\,\log\frac{p(y' \mid y, x)}{p(y' \mid y)} \;=\; \mathbb{E}_{(Y,X)}\!\left[D_{\mathrm{KL}}\!\bigl(p(Y' \mid Y, X)\,\big\|\,p(Y' \mid Y)\bigr)\right],

which is exactly the KL divergence between the conditional distribution of YY' given (Y,X)(Y, X) and the conditional distribution of YY' given YY alone, averaged over the marginal p(Y,X)p(Y, X). This is the definition of DKL(p(YY,X)p(YY))D_{\mathrm{KL}}(p(Y' \mid Y, X) \| p(Y' \mid Y)).

Non-negativity. By Gibbs' inequality, DKL(pq)0D_{\mathrm{KL}}(p \| q) \ge 0 with equality if and only if p=qp = q almost everywhere. Gibbs' inequality follows from the convexity of log-\log: by Jensen,

ipilogqipi  =  Ep ⁣[logqp]    logEp ⁣[qp]  =  log1  =  0.-\sum_i p_i \log \frac{q_i}{p_i} \;=\; \mathbb{E}_p\!\left[-\log\frac{q}{p}\right] \;\ge\; -\log\mathbb{E}_p\!\left[\frac{q}{p}\right] \;=\; -\log 1 \;=\; 0.

Equality holds in Jensen if and only if q/pq/p is constant almost surely, i.e., p=qp = q. Therefore TXY0T_{X \to Y} \ge 0, with equality if and only if knowing XtX_t provides no additional information about Yt+δY_{t+\delta} beyond what YtY_t already provides, i.e., Yt+δXtYtY_{t+\delta} \perp X_t \mid Y_t. \square

For the two-asset BTC/ETH system, the transfer entropies at a given resolution δ\delta organize into an information flow matrix:

T(δ)  =  (0TBTCETH(δ)TETHBTC(δ)0).\mathbf{T}(\delta) \;=\; \begin{pmatrix} 0 & T_{\mathrm{BTC}\to\mathrm{ETH}}(\delta) \\[4pt] T_{\mathrm{ETH}\to\mathrm{BTC}}(\delta) & 0 \end{pmatrix}.

The diagonal entries are zero by definition (a process carries no transfer entropy to itself). The dominant direction of information flow at resolution δ\delta is read off the larger off-diagonal entry. The crossover is the value of δ\delta at which the two off-diagonal entries are equal and the dominant direction switches.

Empirical results. We compute TBTCETH(δ)T_{\mathrm{BTC} \to \mathrm{ETH}}(\delta) and TETHBTC(δ)T_{\mathrm{ETH} \to \mathrm{BTC}}(\delta) at nine resolutions from 1 ms to 1 s, with a fine grid near 15 to 20 ms to resolve the direction-flip crossover. The crossover resolution is identified as the point where TBTCETH=TETHBTCT_{\mathrm{BTC} \to \mathrm{ETH}} = T_{\mathrm{ETH} \to \mathrm{BTC}}, interpolated linearly between the two bracketing grid points.

Transfer entropy between BTC and ETH at nine resolutions
Transfer entropy TBTCETHT_{\mathrm{BTC} \to \mathrm{ETH}} (orange) and TETHBTCT_{\mathrm{ETH} \to \mathrm{BTC}} (blue) at resolutions 1 ms to 1 s. The dashed vertical line marks the crossover resolution, the point at which the dominant direction of information flow reverses. Below the crossover, ETH leads BTC in an information-theoretic sense; above it, BTC leads ETH.

Interactive: drag the slider to change resolution and watch the dominant direction of information flow switch.

Key finding. The information-flow direction reverses at approximately 15 to 20 ms, in quantitative agreement with the lead-lag crossover reported in the companion post (BTC/ETH Lead-Lag, March 2026). At 1 ms, ETH leads BTC in the transfer entropy sense; at 1 s, BTC leads ETH. The Fisher-Rao metric on Sδ\mathcal{S}_\delta changes with δ\delta, and the information-geometric structure of the market at millisecond scales is genuinely different from its structure at second scales. No single flat model captures both.

9. Machine Learning as Data-Driven Geometry

This section surveys how standard machine learning methods implicitly make choices about the geometry of the price process, framed through the resolution lens developed above. No new empirical results appear here; this is a methods catalogue. The synthesis in Section 10 draws only on the empirical findings of Sections 4, 6, 7, and 8.

9.1 Gaussian Process Regression

Definition (GP kernel as Riemannian metric).

A Gaussian process prior over price paths is specified by a positive semi-definite kernel k:R×RRk : \mathbb{R} \times \mathbb{R} \to \mathbb{R}. The kernel defines an inner product on the reproducing kernel Hilbert space Hk\mathcal{H}_k:

f,gk  =  f(t)k(t,t)g(t)dtdt.\langle f, g \rangle_k \;=\; \iint f(t)\, k(t, t')\, g(t')\,\mathrm{d}t\,\mathrm{d}t'.

This is a choice of Riemannian metric on the space of square-integrable price paths. A kernel is stationary if k(t,t)=k(tt)k(t, t') = k(t - t') depends only on the lag τ=tt\tau = t - t', and non-stationary otherwise.

Proposition (Stationary GP misspecification).

A GP with a stationary kernel implies that the ACF of the predicted process is the same at all absolute times and at all resolutions. The price process violates this at every resolution simultaneously: the empirical ACF at 1 ms resolution is negative at short lags; the ACF at 1 hr resolution is indistinguishable from zero. No stationary kernel can simultaneously fit both regimes.

Proof. For a stationary GP with kernel k(τ)k(\tau), the marginal autocorrelation of the GP at lag τ\tau is ρ(τ)=k(τ)/k(0)\rho(\tau) = k(\tau)/k(0), independent of absolute time and of the resolution δ\delta at which the path is sampled. The coarse-grained process at resolution δ\delta has ACF

ρ(δ)(τ)  =  1δ20δ ⁣0δk(τ+su)dsdu,\rho^{(\delta)}(\tau) \;=\; \frac{1}{\delta^2}\int_0^\delta\!\int_0^\delta k(\tau + s - u)\,\mathrm{d}s\,\mathrm{d}u,

which is a smoothed version of ρ\rho and inherits the same sign structure. By Section 6, the empirical ρ(1ms)(1)\rho^{(1\,\mathrm{ms})}(1) is negative while ρ(1hr)(τ)0\rho^{(1\,\mathrm{hr})}(\tau) \approx 0 for all lags. A stationary kernel that produces negative ρ(1ms)(1)\rho^{(1\,\mathrm{ms})}(1) via a negative-valued kk will also produce non-zero ACF at hourly resolution, contradicting the empirical findings. The two constraints are incompatible under stationarity. \square

Non-stationary kernels can in principle capture resolution-dependent structure, but specifying them requires an explicit model for how the ACF changes with δ\delta, which is precisely the resolution manifold of Section 3.

9.2 Neural SDEs

Definition (Infinitesimal generator of a neural SDE).

For the Ito neural SDE

dXt  =  fθ(Xt)dt+gθ(Xt)dWt,\mathrm{d}X_t \;=\; f_\theta(X_t)\,\mathrm{d}t + g_\theta(X_t)\,\mathrm{d}W_t,

the infinitesimal generator acting on a twice-differentiable test function φC2(R)\varphi \in C^2(\mathbb{R}) is

Lθφ(x)  =  fθ(x)xφ(x)+12gθ(x)2xxφ(x).\mathcal{L}_\theta\,\varphi(x) \;=\; f_\theta(x)\,\partial_x \varphi(x) + \tfrac{1}{2}\,g_\theta(x)^2\,\partial_{xx} \varphi(x).

The generator encodes the full first- and second-order local behavior of the process and determines the transition semigroup via E[φ(Xt+s)Xt=x]=esLθφ(x)\mathbb{E}[\varphi(X_{t+s}) \mid X_t = x] = e^{s\mathcal{L}_\theta}\varphi(x).

Theorem (Learned generator is resolution-dependent).

Let θ^(δ)\hat{\theta}(\delta) be the maximum likelihood estimator for the neural SDE trained on observations at timestep δ\delta. Then Lθ^(δ)\mathcal{L}_{\hat{\theta}(\delta)} approximates the generator of the coarse-grained process (St(δ))(S^{(\delta)}_t), not the generator of the underlying price process (St)(S_t). In general, Lθ^(δ)Lθ^(δ)\mathcal{L}_{\hat{\theta}(\delta)} \neq \mathcal{L}_{\hat{\theta}(\delta')} for δδ\delta \neq \delta'.[16][17]

Proof. The Euler-Maruyama discretization at step δ\delta gives the transition

Xt+δ  =  Xt+fθ(Xt)δ+gθ(Xt)δεt,εtN(0,1).X_{t+\delta} \;=\; X_t + f_\theta(X_t)\,\delta + g_\theta(X_t)\,\sqrt{\delta}\,\varepsilon_t, \qquad \varepsilon_t \sim \mathcal{N}(0,1).

The MLE minimizes the KL divergence between the empirical transition density p(δ)(x,y)p^{(\delta)}(x, y) at step δ\delta and the model transition density. By Definition 3, p(δ)(x,y)p^{(\delta)}(x, y) is the δ\delta-step transition density of the coarse-grained process S(δ)S^{(\delta)}. Since Πδ1Πδ2=Πδ1\Pi_{\delta_1} \circ \Pi_{\delta_2} = \Pi_{\delta_1} for δ1δ2\delta_1 \ge \delta_2 (tower property), the coarse-grained process at δ1\delta_1 and δ2\delta_2 differ in their first-order statistics whenever the transition semigroup does not commute with Πδ\Pi_\delta (Theorem 3). Consequently the MLE θ^(δ)\hat{\theta}(\delta) minimizing DKL(p(δ)pδmodel)D_{\mathrm{KL}}(p^{(\delta)} \| p^{\mathrm{model}}_\delta) is resolution-dependent. \square

9.3 Reservoir Computing

Definition (Echo state property).

A reservoir recurrent network with weight matrix WRN×NW \in \mathbb{R}^{N \times N}, input weights WinW_{\mathrm{in}}, and nonlinearity σ\sigma updates its state via

xt  =  σ ⁣(Wxt1+Winut).x_t \;=\; \sigma\!\left(W x_{t-1} + W_{\mathrm{in}}\, u_t\right).

The network has the echo state property if the spectral radius ρ(W)<1\rho(W) < 1. Under this condition, the state xtx_t is a unique asymptotically stable function of the input history (ut,ut1,)(u_t, u_{t-1}, \ldots), with the influence of utku_{t-k} on xtx_t decaying at rate ρ(W)k\rho(W)^k.[18][19]

Proposition (Memory timescale versus input autocorrelation time).

The reservoir's effective memory timescale at input sampling rate δ\delta is

τR  =  δlogρ(W).\tau_R \;=\; \frac{-\delta}{\log \rho(W)}.

The reservoir faithfully propagates autocorrelation structure only when τRτX(δ)\tau_R \approx \tau_X(\delta), where τX(δ)\tau_X(\delta) is the autocorrelation decay time of the input process at resolution δ\delta. When τRτX(δ)\tau_R \ll \tau_X(\delta) the reservoir forgets correlated history before it can influence the readout; when τRτX(δ)\tau_R \gg \tau_X(\delta) the reservoir saturates with stale history.

Proof. The contribution of utku_{t-k} to the current state xtx_t passes through kk applications of WW and is bounded in operator norm by ρ(W)kWinutk\rho(W)^k \|W_{\mathrm{in}}\| \|u_{t-k}\|. Setting this to e1e^{-1} of its initial value gives k=1/(logρ(W))k^* = 1/(-\log \rho(W)) steps, corresponding to elapsed time τR=kδ\tau_R = k^* \delta. The input autocorrelation ρu(k)\rho_u(k) contributes a signal of amplitude ρu(k)\rho_u(k) at lag kk; the reservoir transmits this signal only for kkk \le k^*. If τX(δ)τR\tau_X(\delta) \gg \tau_R, substantial autocorrelation at lag τX(δ)/δ>k\tau_X(\delta)/\delta > k^* is discarded. If τX(δ)τR\tau_X(\delta) \ll \tau_R, the reservoir retains state from lags where ρu(k)0\rho_u(k) \approx 0, adding noise to the readout. \square

From Section 6, τX(δ)\tau_X(\delta) at δ=1ms\delta = 1\,\mathrm{ms} is roughly 5 to 10 ms. A reservoir intended to exploit sub-second mean reversion must be tuned to τR5-10ms\tau_R \approx 5\text{-}10\,\mathrm{ms}, requiring ρ(W)eδ/τRe0.10.9\rho(W) \approx e^{-\delta/\tau_R} \approx e^{-0.1} \approx 0.9. At δ=1hr\delta = 1\,\mathrm{hr} where the ACF is near zero, the optimal reservoir is nearly memoryless.

9.4 Signature Methods

Definition (Path signature).

For a continuous bounded-variation path X:[s,t]RmX : [s, t] \to \mathbb{R}^m, the depth-dd signature is the collection of iterated integrals

Sd(X)[s,t]  =  (1,  stdXi1,  st ⁣st2dXi1dXi2,  ,  st ⁣ ⁣stddXi1dXidd integrals)S^{\le d}(X)_{[s,t]} \;=\; \left(1,\; \int_s^t \mathrm{d}X^{i_1},\; \int_s^t\!\int_s^{t_2} \mathrm{d}X^{i_1}\mathrm{d}X^{i_2},\; \ldots,\; \underbrace{\int_s^t\!\cdots\!\int_s^{t_d} \mathrm{d}X^{i_1}\cdots \mathrm{d}X^{i_d}}_{d \text{ integrals}}\right)

over all multi-indices (i1,,ik){1,,m}k(i_1, \ldots, i_k) \in \{1,\ldots,m\}^k and depths k=1,,dk = 1, \ldots, d. The full signature S(X)S^\infty(X) is the limit as dd \to \infty.

Theorem (Chen's identity).

For a path XX and intermediate time u[s,t]u \in [s, t], the signature satisfies

S(X)[s,t]  =  S(X)[s,u]S(X)[u,t],S^\infty(X)_{[s,t]} \;=\; S^\infty(X)_{[s,u]} \otimes S^\infty(X)_{[u,t]},

where \otimes denotes the tensor product of the free tensor algebra. Consequently, the linear functionals of S(X)S^\infty(X) are dense in the space of continuous functions on the space of paths, uniformly over compact sets.[8][20]

Chen's identity is the algebraic backbone of signature universality: it says the signature is a homomorphism from paths under concatenation to the tensor algebra under multiplication, and universality follows from the Stone-Weierstrass theorem applied to the resulting algebra of functions on path space.

Proposition (Depth-resolution tradeoff).

Let X(δ)X^{(\delta)} be the price path coarse-grained at resolution δ\delta with per-step volatility σδ\sigma\sqrt{\delta}. The depth-kk term of the signature has magnitude Sk=O ⁣((σδ)k/k!)\|S^k\| = O\!\left((\sigma\sqrt{\delta})^k / k!\right). The effective depth d(δ)d^*(\delta) at which the (d+1)(d^*+1)-th term falls below a tolerance ε\varepsilon relative to the depth-1 term is

d(δ)    log(1/ε)log(1/(σδ))1,d^*(\delta) \;\approx\; \frac{\log(1/\varepsilon)}{\log(1/(\sigma\sqrt{\delta}))} - 1,

which is decreasing in δ\delta for σδ<1\sigma\sqrt{\delta} < 1. Coarser resolution requires shallower signatures.

Proof. Each iterated integral ststkdXi1dXik\int_s^t \cdots \int_s^{t_k} \mathrm{d}X^{i_1}\cdots \mathrm{d}X^{i_k} over a window of n=T/δn = T/\delta steps has variance (σδ)2kCk(\sigma\sqrt{\delta})^{2k} \cdot C_k where CkC_k is a combinatorial constant of order 1/k!1/k! from the number of non-decreasing index tuples. The ratio of successive depth norms is thus Sk+1/Sk=O(σδ)\|S^{k+1}\|/\|S^k\| = O(\sigma\sqrt{\delta}). Setting Sd+1/S1=(σδ)d<ε\|S^{d^*+1}\|/\|S^1\| = (\sigma\sqrt{\delta})^{d^*} < \varepsilon and solving gives the stated formula. \square

At δ=1ms\delta = 1\,\mathrm{ms} with σδ104\sigma\sqrt{\delta} \approx 10^{-4}, dd^* is large and high-depth signatures contribute meaningfully. At δ=1hr\delta = 1\,\mathrm{hr} with σδ102\sigma\sqrt{\delta} \approx 10^{-2}, d2d^* \approx 2 for ε=104\varepsilon = 10^{-4}: depth-3 and higher terms are negligible, and the price path at hourly resolution is well described by its level-1 and level-2 signature components.

9.5 The Common Thread

Proposition (Resolution as a universal model hyperparameter).

For any model class M\mathcal{M} trained on observations of the price process at resolution δ\delta, let m(δ)=argminmMDKL(PδPm)m^*(\delta) = \arg\min_{m \in \mathcal{M}} D_{\mathrm{KL}}(P^\delta \| P^m) be the model minimizing KL divergence to the true coarse-grained law PδP^\delta. Then δm(δ)\delta \mapsto m^*(\delta) is non-constant: no single mMm \in \mathcal{M} simultaneously minimizes KL divergence at two different resolutions δδ\delta \neq \delta' unless Pδ=PδP^\delta = P^{\delta'}.

Proof. By the resolution manifold (Definition 3), PδPδP^\delta \neq P^{\delta'} whenever the transition semigroup does not commute with Πδ\Pi_\delta, which holds for any process with resolution-dependent autocorrelation (established empirically in Section 6). The KL divergence is zero if and only if the two distributions are equal; thus DKL(PδPm(δ))>0D_{\mathrm{KL}}(P^\delta \| P^{m^*(\delta')}) > 0 at any model optimized for δδ\delta' \neq \delta. The resolution that minimizes KL divergence is therefore a genuine hyperparameter of the model selection problem, not a technical implementation detail. \square

The kernel length scale, SDE timestep, reservoir spectral radius, and signature truncation depth are all parameterizations of this same hyperparameter. A principled approach to price modeling must either restrict to a specific resolution band or explicitly model the resolution-dependence by treating δ\delta as a parameter of the statistical model rather than a fixed technical constant.

10. Synthesis

The four empirical investigations of Sections 4, 6, 7, and 8 all point in the same direction.

Variance ratio and Hurst. The variance ratio VR(k)VR(k) is below 1 at sub-second scales for both BTC and ETH, rejecting the GBM null with high statistical confidence. The Hurst exponent estimate H^<12\hat{H} < \frac{1}{2} at sub-second resolutions and H^12\hat{H} \approx \frac{1}{2} at hourly resolutions. The scale at which the Hurst exponent crosses 12\frac{1}{2} is approximately 10 to 60 minutes.

Autocorrelation structure. The signed-volume autocorrelation is negative at lag 1 ms for both assets, consistent with order-book microstructure mean reversion. The return autocorrelation at 1 s resolution is 0.048-0.048 (p<0.001p < 0.001), in quantitative agreement with the PolyBackTest analysis. At longer lags and coarser resolutions, the autocorrelations are not distinguishable from zero.

Wavelet decomposition. The wavelet variance profile is bottom-heavy: the bulk of the variance is at the finest decomposition levels (sub-100 ms). During the Q1 2025 BTC trending regime, the profile shifts toward coarser scales. This is consistent with the Hurst exponent evidence: trending behaviour is a coarse-scale phenomenon, mean reversion a fine-scale one.

Information geometry. The transfer entropy crossover at 15 to 20 ms confirms the lead-lag reversal finding from the companion post. The Fisher-Rao metric on the statistical manifold changes with resolution: the market's information geometry at millisecond scales is genuinely non-flat, while at hourly scales it approximates the flat Gaussian family.

Summary. No single model is adequate across all resolutions. The market at sub-second scales is mean-reverting, negatively autocorrelated, bottom-heavy in wavelet variance, and has ETH leading BTC in information flow. The market at hourly scales is approximately flat GBM, with no detectable autocorrelation, uniform wavelet variance, and BTC leading ETH in information flow. Any strategy, model, or pricing formula that ignores this resolution-dependence is making a false ontological commitment.

11. Conclusion

We began with the observation that the mid-price of a traded asset is the projection of a deterministic high-dimensional system onto an observable scalar, and the apparent stochasticity is generated by the temporal coarse-graining we perform when we bin trades into intervals of duration δ\delta.

The empirical findings of this post quantify exactly how the projected noise changes with δ\delta. At millisecond resolution, the noise has memory: it is negatively autocorrelated, mean-reverting, and has directional structure between assets. At hourly resolution, the memory is gone: the noise looks flat, Gaussian, and approximately independent. The transition happens over the range 10 ms to 60 minutes, and it is monotone.

The resolution-dependence arises directly from how a deterministic agent-interaction system looks when observed through a temporal filter. The individual order decisions that make up the price process are made by agents with information, objectives, and constraints; the randomness enters only when we aggregate many such decisions into a single log-return. At fine aggregation, the individual structure still shows through. At coarse aggregation, the central limit theorem has washed it away.

What does this mean for practice? A model that specifies the resolution at which it operates and makes claims only within that band is doing something honest. A model that claims resolution-independence, that the same drift, diffusion coefficient, or kernel applies from 1 ms to 1 hr, is making an empirically false claim that this post has documented in detail.

The geometric language is not decorative. The curvature of the state-space metric, the Fisher-Rao geometry of the statistical manifold, the diffusion distance on the data manifold, and the wavelet variance decomposition of path-space all formalize the same underlying intuition: at fine timescales, the market lives on a non-flat manifold, and the tools of flat Euclidean geometry, which is to say most of standard financial mathematics, are the wrong tools. At coarse timescales, the manifold flattens, and the standard tools become adequate approximations. The art is knowing which timescale you are in.

References

  1. Lo, A. W. and MacKinlay, A. C. (1988). Stock market prices do not follow random walks: Evidence from a simple specification test. Review of Financial Studies, 1(1), 41-66.
  2. Taqqu, M. S., Teverovsky, V., and Willinger, W. (1995). Estimators for long-range dependence: An empirical study. Fractals, 3(4), 785-798.
  3. Beran, J. (1994). Statistics for Long-Memory Processes. Chapman and Hall. Spectral characterization of long memory.
  4. Doob, J. L. (1953). Stochastic Processes. Wiley. Chapter 11: characterization of Gaussian Markov processes.
  5. Black, F. and Scholes, M. (1973). The pricing of options and corporate liabilities. Journal of Political Economy, 81(3), 637-654.
  6. Avellaneda, M. and Stoikov, S. (2008). High-frequency trading in a limit order book. Quantitative Finance, 8(3), 217-224.
  7. Gatheral, J., Jaisson, T., and Rosenbaum, M. (2018). Volatility is rough. Quantitative Finance, 18(6), 933-949.
  8. Lyons, T. J. (1998). Differential equations driven by rough signals. Revista Matematica Iberoamericana, 14(2), 215-310.
  9. Mallat, S. G. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7), 674-693.
  10. Percival, D. B. and Walden, A. T. (2000). Wavelet Methods for Time Series Analysis. Cambridge University Press.
  11. Coifman, R. R. and Lafon, S. (2006). Diffusion maps. Applied and Computational Harmonic Analysis, 21(1), 5-30.
  12. Amari, S. (1985). Differential-Geometrical Methods in Statistics. Springer. Fisher-Rao metric, Cramer-Rao as geodesic bound (Theorem 2.1), KL divergence as natural divergence to second order (Chapter 3).
  13. Amari, S. (2016). Information Geometry and Its Applications. Springer.
  14. Wiener, N. (1930). Generalized harmonic analysis. Acta Mathematica, 55, 117-258.
  15. Khinchin, A. (1934). Korrelationstheorie der stationaren stochastischen Prozesse. Mathematische Annalen, 109, 604-615.
  16. Li, X., Wong, T.-K. L., Chen, R. T. Q., and Duvenaud, D. (2020). Scalable gradients for stochastic differential equations. NeurIPS 2020.
  17. Kidger, P., Foster, J., Li, X., Oberhauser, H., and Lyons, T. (2021). Neural SDEs as infinite-dimensional GANs. NeurIPS 2021.
  18. Jaeger, H. (2001). The echo state approach to analysing and training recurrent neural networks. GMD Technical Report 148.
  19. Maass, W., Natschlaeger, T., and Markram, H. (2002). Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural Computation, 14(11), 2531-2560.
  20. Chevyrev, I. and Kormilitzin, A. (2016). A primer on the signature method in machine learning. arXiv:1603.03788.
  21. Cont, R. (2001). Empirical properties of asset returns: stylized facts and statistical issues. Quantitative Finance, 1(2), 223-236.
  22. Lo, A. W. (1991). Long-term memory in stock market prices. Econometrica, 59(5), 1279-1313.
  23. Roll, R. (1984). A simple implicit measure of the effective bid-ask spread in an efficient market. Journal of Finance, 39(4), 1127-1139. Derives Cov(Δpt, Δpt-1) = -s²/4 for the bid-ask bounce; applied to NYSE stocks.
  24. Hasbrouck, J. (2007). Empirical Market Microstructure. Oxford University Press. Intraday variance ratio profiles for NYSE blue chips including IBM, GE, and Merck.
  25. Andersen, T. G., Bollerslev, T., Diebold, F. X., and Ebens, H. (2001). The distribution of realized stock return volatility. Journal of Financial Economics, 61, 43-76. Realized volatility for all 30 DJIA components; wavelet structure and regime shifts.

Comments

Loading…

Leave a comment

or comment as guest

$...$ inline · $$...$$ display

0 / 2000