← Back to home

Selected research.

Notes from our work building autonomous research systems for markets — the questions we're chasing, and what we've found so far. We publish what we can.

2026 · Hyperliquid perps · 365 days of fills

Autonomously researching granular market dynamics on Hyperliquid

In almost every market, most participants lose, a few win a great deal, and it is very hard to see who is who, because nearly all of the trading is private. Hyperliquid is the rare exception. It is a large crypto perpetual-futures venue that runs on its own blockchain, and that chain publishes every transaction that takes place on it: every order, every fill, every liquidation, each one attributed on-chain to the account that made it. There is no privileged data feed and no asking permission. The entire order flow of a real, liquid market is simply a public ledger you can replay end to end. That makes it the ultimate place for autonomous research, a market you can study in full rather than through a keyhole. So we pointed our autonomous research system at our local archive, 365 days, \(\$5.95\text{T}\) of volume, roughly 825,000 addresses, every trade, and asked the oldest question in markets. Who makes money, and how? What follows is the long version: the shape of the population, the strategies that actually win, the ones that only look like they win, and the handful of conditions that separate them. The numbers are from one venue, but the structure is the structure of markets in general, visible here only because the tape is finally public.

HOW WE LOOKED

The system starts from the whole population, not a handful of wallets from the leaderboard. The fill archive is compacted into per-account, per-day aggregates, then accounts are clustered on behavior alone, on how they trade rather than how they did, so that profitability comes out as a finding instead of a definition. From there it walks individual wallets fill by fill to reconstruct what each one is really doing, and every claim has to clear an adversarial check: an open-position mark, so a wallet sitting on hidden losses can't pass as a winner; a stated null hypothesis to beat; an out-of-sample split. Several of the system's own tidiest stories were refuted by its follow-up checks, and those refutations are the part we trust most.

One measurement choice governs everything below: the spine is realized profit, closed-trade PnL minus fees. Open-position PnL and funding are handled separately, and an address is an account, not a person. Summed across all 825,000 addresses, gross realized PnL is just \(+\$84\text{M}\) on \(\$5.95\text{T}\) of volume, 1.4 bps from zero. It is not exactly zero only because a year is a finite slice of an ongoing market: positions already open at the start, and still open at the end, leave a tiny boundary residual rather than real edge. That near-zero total is what licenses the first finding.

A FEE FUNNEL WITH A THIN WINNING RIM

Of the 824,945 addresses that ever traded a perp, only 23.4% are profitable over their lifetime. The population's aggregate net result is \(-\$786\text{M}\), and because the market is essentially zero-sum before costs, that loss is almost exactly the fee bill:

\[ -\$786\text{M} \;\approx\; \underbrace{+\$84\text{M}}_{\text{gross PnL}} \;-\; \underbrace{\$870\text{M}}_{\text{fees}} \]

The crowd does not mainly lose to sharper traders. In net terms it loses the toll for being there. And what profit does exist is extraordinarily concentrated: the top 10 profitable wallets take 21.5% of all gross profit, the top 100 take 51.8%, the top 1,000 take 81.7%. Persistence is what separates the donors from the regulars: the 57,092 wallets with at least 20 active days and \(\$1\text{M}\) of lifetime volume lose only \(-\$158\text{M}\) in aggregate, while the 768,000 small or short-lived accounts lose \(\$628\text{M}\). Most of the money extracted from this market comes from a long tail of transient accounts, not from its professionals.

Splitting the zero-sum identity across the two groups makes the flow of funds explicit. The transient tail pays only \(\$110\text{M}\) of its \(\$628\text{M}\) loss in fees; the other \(\$518\text{M}\) is a straight transfer to the persistent traders. The persistent group is thus up \(\$602\text{M}\) gross against the tail, pays \(\$760\text{M}\) in fees, and nets \(-\$158\text{M}\). Money flows from transients, to persistent traders, to the exchange. The house is the only group that wins at the level of a class.

Cumulative share of all realized profit by wallet rank. The top 100 wallets take 52%, the top 1,000 take 82%.
Transient tail 768k wallets net −$628M Persistent traders 57k wallets net −$158M Exchange +$870M in fees the only winner $518M gross $760M fees $110M in fees, paid straight to the exchange
The money circuit at the fee floor. The transient tail loses $628M: $518M of it is a straight transfer to the persistent traders, and the other $110M goes to the exchange as fees. The persistent traders then pay $760M of fees of their own, which leaves the exchange as the only group that comes out ahead.
HOW A LOSING ACCOUNT DIES

Following 444,000 tail accounts through their active days shows the donor population is continuously regenerated, and that it dies in a specific way. The median active day loses a near-constant \(\$0.70\) regardless of account age, but the mean daily loss grows an order of magnitude, from \(-\$42\) on the first active day to \(-\$854\) by the eleventh. Accounts that survive raise their stakes while the cohort thins fast: 23% never return after day one, half are gone by day five. Liquidation roughly doubles the death rate (30% of liquidation days are an account's last, versus 16% otherwise), and most strikingly, 46% of all lifetime losses among losing tail accounts are realized on the final active day. The typical donor death is not a slow bleed. It is a terminal escalation, the largest position being the last.

The donor lifecycle: cohort attrition (bars) against mean daily loss (line) by day of account life. Survivors raise their stakes as the crowd thins out.
EIGHT WAYS PEOPLE TRADE, AND ONLY TWO THAT PAY

Clustering the 57,092 persistently active wallets on ten behavior-only features (volume, fill intensity, fill size, maker share, liquidation share, activity regularity, fee rate, coin concentration, builder-venue share, coin count, and nothing about outcomes) yields a clean read on which styles of trading actually earn:

ClusterWalletsNet PnL% prof.Med. vol.Fills/dayMaker
High-frequency firms6,806+$310M32%$118M3080.13
Passive makers4,869+$41M41%$5.0M570.74
Diversified grinders8,875−$21M22%$3.9M520.03
Builder-venue users5,132−$24M32%$4.1M800.30
Pure takers5,243−$53M10%$2.4M220.00
Single-market specialists9,238−$55M19%$4.2M240.05
Generic actives14,715−$141M24%$3.4M280.06
Liquidation-prone2,214−$214M9%$2.9M170.03

Behavioral clusters sorted by aggregate net realized PnL. Only the top two are net positive. "% prof." is the share of wallets profitable over their lifetime; "Maker" is median maker share.

Cluster medians of the ten behavioral features, z-scored across clusters. Each style has its own fingerprint.

Three things survive scrutiny. First, only two of the eight styles are net positive: high-frequency firms and passive makers. Every taker-styled, concentrated, or casual profile loses in aggregate, worst of all the liquidation-prone cluster, which loses \(-\$214\text{M}\) from just 2,214 wallets. Second, even the winning style is mostly losers: inside the high-frequency cluster only 32% of wallets are profitable and the median member loses \(\$42\text{k}\). The cluster's \(+\$310\text{M}\) is its top 50 wallets earning \(+\$1.64\text{B}\) net of the other ~6,750 members losing about \(\$1.3\text{B}\). Trading like a high-frequency firm does not pay; being one of the best few dozen does. Third, the top winners' edges (around 156 bps of volume) are far too large for market-making rebates alone, which points at what they are really doing.

TWO WAYS TO WIN

Decomposing the top 50 winners by how balanced their daily flow is (a market maker buys and sells in near-balance every day; a position trader's days are one-sided) splits the elite cleanly:

StyleWalletsNet PnLVolumeMed. edgeMed. maker
Directional position-takers32+$1,095M$38B385 bps0.15
Mixed10+$296M$61B52 bps0.37
Market makers8+$249M$371B7.8 bps0.81

The top 50 winners of the high-frequency cluster, by daily flow balance. The two profitable modes are opposite corners of the same picture.

Two-thirds of the elite profit pool is directional: 32 wallets earning a median 385 bps of volume on one-sided days, with low maker share. They are high-frequency in fill count but position traders in economics. True spread-capture market making is the minority of elite profit (8 wallets, \(+\$249\text{M}\) at 7.8 bps) while supplying ten times the volume. So the two ways to win are: be paid for presence, or hold a conviction longer than anyone else can. Everything in the strategy dossiers is one of these two with different machinery bolted on.

The top 50 winners: realized edge against how one-sided their daily flow is, marker size scaled by volume. The two profitable modes sit in opposite corners, balanced flow at a few bps on enormous volume, or one-sided flow at hundreds of bps.
THE STRATEGIES, UP CLOSE

Each wallet below passed an open-position mark, so the profits are real rather than deferred losses. Wallets are pseudonymized by profit rank.

The new-listing short specialist · +$140M

The single biggest winner on the exchange. 1.34M fills over 319 days, 89% aggressive, median order around \(\$2.5\text{k}\), which is execution slicing, not blocks. Its profit is a list of post-listing collapses: one 2025 listing alone is worth \(+\$52\text{M}\). The worst day of its life, \(-\$25\text{M}\), was the day after that listing. It shorted into the launch pump, took the pain, and made multiples back as the coin fell apart.

The hybrid maker-sniper · +$128M

71% of its 3.6M fills are passive, yet the profit splits the other way: \(+\$56\text{M}\) from \(\$7.0\text{B}\) of maker flow versus \(+\$72\text{M}\) from only \(\$1.35\text{B}\) of aggressive flow, 53 bps per aggressive dollar. It quotes small two-sided clips all day and strikes directionally when its signal fires, holding a persistently short-tilted book. Its worst day in a full year of \(\$8.3\text{B}\) volume was \(-\$3.5\text{M}\), remarkable control of the left tail.

The canonical market maker · +$108M

The purest maker in the elite: 44M fills, active 365 of 365 days, daily buy/sell balance near zero on every day, and negative lifetime fees, because it is paid rebates rather than charged. The striking detail is that liquidation-flagged fills carry \(+\$51\text{M}\), nearly half its lifetime profit, on just \(\$2.3\text{B}\) of notional. For the best maker on the venue, absorbing forced selling at cascade prices is not a side perk of quoting. It is half the business.

The passive block swing whale · +$47M

The opposite shape. 23,000 fills, 92% passive, and a median order of \(\$1.1\text{M}\), giant resting clips filled piecewise. Its days are whole-day one-sided: multi-day swing positions in one major coin, built and unwound passively, ended flat. Nearly all of its profit is that single coin.

The small-cap campaign short · +$72M

Short across hundreds of low-cap coins at once. We first catalogued it as a distinct "dislocation fisher," then refuted our own reading at the tape level: it has no entry trigger, rests its orders at the touch rather than deep in the book, and simply sells weak coins that keep sliding. Two-thirds of its position dollars are held longer than a week, and over half its lifetime profit (\(+\$38.9\text{M}\)) was made on October 10 alone. The eighth archetype dissolved into this one once we looked closely, which is exactly the kind of self-correction the open checks are for.

The cascade-day event specialist · +$35M

Active only 40 days all year, and \(+\$33.9\text{M}\) of its \(+\$34.9\text{M}\) lifetime profit came on a single day, October 10, churning \(\$1.59\text{B}\) across 168 coins, flat by the end of the day. It buys dislocations and sells the recovery intraday, then goes dormant until the next event.

THREE ARCHETYPES, THREE CLOCKS

Marking each archetype's entries against the second-by-second tape shows they run on completely different clocks, and it overturns the intuitive story about the big directional winners. They have no short-horizon timing edge at all: their entries are coin-flips at every horizon out to a day, and on their heaviest days they are deeply underwater at first, because they short into pumps that keep pumping. The maker, by contrast, is paid within ten seconds of a passive fill and flat beyond a minute. So the edge of the campaign trader is not when it enters but where the campaign ends: new listings systematically collapse, weak coins bleed, and the operator sizes the thesis, withstands \(-\$25\text{M}\) days, and exits over weeks. The moat is conviction plus a balance sheet, not signal latency, which is exactly what should survive on a venue where every fill is public.

Two mechanisms pay them to wait. Funding carry: shorts in overheated perps collect funding while they hold, enough that the biggest winner's ETH book earned an estimated \(+\$13.5\text{M}\) in funding alone. And the risk engine: between 20% and 40% of the directional elite's lifetime profit is realized for them by the exchange force-closing their deeply-winning shorts at cascade bottoms, a better exit than any order they could have placed. No one needs to time the bottom. The venue's own machinery does it.

Entry markouts against horizon for the two campaign shorts, the small-cap short, and the mega maker. Three archetypes on three clocks: seconds, hours, weeks.
THE LIQUIDATION ECOSYSTEM HAS NO LIQUIDATORS

Forced liquidations are where the donors' losses are realized, so we expected a class of predators built to hunt them. There isn't one. In a 27-hour sample, 100% of the fills that absorb liquidated flow are passive resting orders. No aggressive "liquidation hunters" exist; forced selling simply sweeps the resting book, and the wallets on the other side are the same market makers from the winning cluster. Absorbing forced flow is a perk of quoting, not a strategy. Even the protocol's own backstop vaults, active all 365 days with tens of billions of volume, roughly break even. The backstop is a utility, not a profit center.

WEALTH TRANSFER IS CONCENTRATED IN TIME

The median day moves \(\$27\text{M}\) between wallets. October 10, 2025, the largest liquidation cascade in the sample, moved \(\$1.18\text{B}\) in a single day, which is 8.8% of the year's total, \(8.6\times\) the second-largest day and \(43\times\) the median. That day 60,441 wallets lost \(\$1.50\text{B}\) (median loser \(-\$125\)), and the top 100 winning wallets took 77% of the transfer. This is why every elite archetype's best day is the same date: a cascade is when the crowd's losses are realized all at once, and whoever is structurally positioned collects years of edge in hours. The top 20 days account for a quarter of the entire year's wealth transfer.

Daily cross-wallet wealth transfer, log scale. October 10 is a 43x-median outlier; the cluster in early 2026 is the winter volatility events.

Skill on these days repeats. Wallets that won at least \(\$100\text{k}\) on the great cascade are 63% profitable across the other big event days, double the 31% base rate. But fortune does not: the single largest winner of October 10, up \(\$81\text{M}\) that day, was a 33-day-old account born the day before the cascade. Three months later it lost \(\$212\text{M}\) in one day and never traded again, finishing as the single biggest lifetime loser on the exchange. Event skill persists for the many; event fortunes mean-revert violently for the few who confuse one win with skill.

SKILL OR LUCK?

Splitting the year at its midpoint and ranking the 16,825 wallets active in both halves, first-half profit rank predicts second-half rank with Spearman \(\rho = 0.23\) (\(p \approx 10^{-200}\)), and the persistence is strongest exactly where it matters. The top-100 first-half winners stay 70% profitable in the second half, earning \(+\$393\text{M}\); the top decile stays 55% profitable against 20 to 28% for lower deciles. Losing persists too: the bottom decile posts a median \(-\$20\text{k}\) again in the second half. The behavioral classes are stable across both halves as well. This is structure, not the residue of one lucky regime. One honest caveat: requiring activity in both halves only measures persistence among survivors, and cannot count the first-half winners who simply stopped.

Second-half profitability by first-half profit decile. Winning persists at the top, and losing persists at the bottom.
WHAT ISN'T THERE

As much of the value is in what the system looked for and could not find. Several folk strategies do not exist at scale on this venue. There is no standalone liquidator class (absorption is passive market making). There is no copy-trading crowd shadowing the giants, even though every fill is public in real time: testing all flow around the biggest winner's entries, the crowd follows price, not fills. There are no on-venue carry farmers (screening every wallet for the long-spot, short-perp signature turns up two small accounts). And there are no smooth tail-sellers quietly compounding toward a blowup: every abnormally smooth equity curve belongs to a high-frequency trader, where smoothness is the law of large numbers, not a hidden martingale. The system kept proposing each of these, then failing to find it.

WHAT IT TAKES TO MAKE MONEY

Strip away the taxonomy and four conditions describe essentially every durable winner. Almost no wallet meeting none of them is profitable.

Be paid for presence, not prediction. The maker class earns its ~8 bps by being on the book continuously, collecting spread, rebates, and above all the cascade discount. The edge is realized in seconds and needs no view at all.
Or hold a view longer than anyone else can. The directional elite has no timing edge, yet wins 300 to 400 bps by sizing structural theses, holding for weeks through brutal drawdowns, collecting funding while it waits, and letting the risk engine bottom-tick its exits.
Be there on the handful of days that matter. A quarter of the year's wealth transfer happens on twenty days, and every archetype's best day is the same day. Skill on those days repeats; fortunes won there without skill do not.
Survive your own success. The population's defining failure is escalating until the last day takes half of everything, and it runs at every scale, from the \(\$50\) donor to the wallet that won \(\$81\text{M}\) and gave back \(\$212\text{M}\) three months later.

Everything else we tested, copying the visible fills of the best wallets, hunting liquidations, farming carry, grinding high win-rate streaks, demonstrably does not exist as a profitable class. The market pays for liquidity, conviction, presence at the extremes, and discipline. It pays for nothing else. None of that is a crypto curiosity. It is the basic arithmetic of who wins in any market, and Hyperliquid is simply the first place we could watch all of it at once.

OPEN QUESTIONS
What signal do the directional winners trade? We can show they have no short-horizon timing edge and that they hold for weeks, but not yet what tells them which thesis to size.
Wallets are not operators. The two largest winners are confirmed as one operator, with \(\$177\text{M}\) of trades crossing directly between their accounts. Cross-venue and multi-wallet structure is invisible to single-address analysis, so the real concentration is higher than any number here.
SCOPE
Realized profit and loss only. Funding and open positions are excluded from the headline figures and estimated separately for any wallet we single out. An address is an account, not a person or a firm. Fee totals are floors, since builder and deployer fees aren't in the archive. Findings are conditional on one venue over one year. Pseudonymized, with live-edge parameters held back. A living document, not a finished paper.

Event Horizon Labs · an autonomous research run, June 2026

2026 · arXiv:2605.23007 · q-fin.TR

Can agentic loops optimize noisy rewards?

FunSearch and AlphaEvolve showed that an LLM in an evolutionary loop can rewrite code toward better solutions on deterministic targets — matrix multiplication, bin packing, compiler heuristics. Markets are the opposite: the reward is stochastic, non-stationary, and trivially easy to overfit. We put an LLM-driven evolutionary loop to that harder test — rewriting the code of a trading strategy, scoring each candidate against a held-out backtest, and keeping and recombining what survives — to ask whether an agentic loop is doing research or just p-hacking.

HOW IT WORKS

An LLM proposes a diff or a full rewrite to the parts of the strategy marked mutable; everything else — simulation, data, PnL accounting — stays fixed, so gains reflect real algorithmic change rather than evaluation artifacts. A structured population with quality-diversity selection and island migration keeps the search broad while it compounds on its best solutions, and mutation queries route across several frontier and efficient-tier models rather than betting on one.

WHAT WE FOUND
Gains that hold out of sample. Out-of-sample Sharpe improves by 0.6 to 1.8 points across runs; the best run reaches a test Sharpe of 5.65.
Not p-hacking. Against a conservative best-of-K null, the selected strategy sits roughly 45 standard deviations into the tail on held-out data — and the out-of-sample curve rises rather than decays, the opposite signature of overfitting.
Better forecasts, too. In the forecasting-only run, an evolved feature set roughly doubles out-of-sample R² over a three-feature baseline — though higher accuracy only becomes PnL once execution is re-tuned.
Model-agnostic. Best results trace their lineage through three to five distinct LLMs, so the engine compounds with each new model release rather than betting on one.
OPEN PROBLEMS
Metrics aren't PnL. Higher forecasting accuracy underperforms out-of-sample until execution hyperparameters are re-tuned.
Bigger search, more overfitting. The largest search space gives the best risk-adjusted result but the widest validation-to-test retention gap.
How far can it go? The objective isn't gradient-friendly, so there's no inner tuning loop yet — and the deeper question stands: how far can agentic loops push as reward functions get noisier?
SCOPE
Conditional on the backtest's market model — exchange-aggregated data that isn't directly tradable, full fill at the limit price, a specific fee and impact model. Results do not transfer out-of-the-box to live trading; live assessment is future work.

Kvasiuk, Li, Colegrove, Münchmeyer
University of Wisconsin–Madison & Event Horizon Labs

Read the paper →