Random Mapping of a min Covering
Random Number Mapping: a quick stress-test for “too-good” backtests
Responsible play note: this is hobby stats and code. Lotteries are still luck-first.
If it stops being fun, it’s time to step away.
What problem are we poking?
You’ve got a covering set (a fixed pack of lines) and you backtest it on the historical draws.
Sometimes the backtest looks… spicy. Like: “Wait, why did this set hit 5-of-5 on that many draws?”
Before we start daydreaming, we should ask a rude but healthy question:
Is this backtest strong because the cover is genuinely good, or because it accidentally fits the quirks of the past?
This post shows a simple stress-test: randomly relabel the numbers (many times) and see how the same cover behaves.
The idea in one sentence
A permutation of 1..50 creates a “new” cover that is structurally the same (still a cover), just with the labels shuffled.
If we try a lot of these shuffles, we get a feel for: - what “normal” looks like, - how rare a given backtest score is, - and how easy it is to cherry-pick a great-looking result when you run many trials.
What the script does
Inputs:
- hist_df.csv with st1..st5 (past draws)
- covering_50_5_4_33572.csv (your cover lines)
- optional: tot_df_dynamic_basic.parquet (to export the best found cover with extra features)
For each simulation:
1. Draw a random permutation perm of 0..49 (numbers 1..50).
2. Remap the cover lines through perm.
3. Compare the remapped cover against the real history and count:
- did we get at least one 5-hit line in each draw?
- did we get at least one 4-hit line in each draw?
- how many 4-hit / 5-hit lines on average?
We also run the original cover first (sim_id = -1), as the reference.
Baselines (random tickets):
- 5-hit chance per draw (with M lines): p5_norm = M / C(50,5)
- 4-hit chance per draw uses the exact hypergeometric form:
1 - C(N-K4, M)/C(N, M) where K4 = 225 for 5/50.
Outputs you get
-
covering_random_mapping_100_sims.csv
One row per sim with p(>=1 five-hit), p(>=1 four-hit), and lift values. -
best_number_mapping_by_5hit_lift.csv
The permutation that produced the best 5-hit lift in this run. -
best_cover_by_5hit_lift.csv
The remapped cover lines for that best permutation (plus tot_df features if found).
Results (auto-filled by the script)
Setup
- Sims: 1000 (seed=123)
- History draws: 912
- Cover size M: 33572
- Total combinations C(50,5): 2,118,760
- Random baseline p(>=1 five-hit): 1.584512%
- Random baseline p(>=1 four-hit): 97.250882%
Original cover (real labels)
- p(>=1 five-hit): 1.316%
lift vs random: 0.830 - p(>=1 four-hit): 98.684%
lift vs random: 1.015 - mean #4-hit lines per draw: 3.624
lift vs random mean: 1.016
Best remapped cover (best sim in this run)
- best sim_id: 20
- p(>=1 five-hit): 3.180%
lift vs random: 2.007
Where does the original sit among random remaps?
- Percentile of original 5-hit lift among the 1000 remaps: 30.3 percentile
(If that percentile is high, it means the original labels look unusually good compared to typical shuffles. If it’s mid-pack, the “magic” was probably just normal variance.)
Lift distribution (remaps only)
- lift5 50% (median): 0.969
- lift5 90%: 1.315
- lift5 95%: 1.453
- lift5 99%: 1.730
A reality check (the part nobody wants to read)
If you run lots of permutations and keep the best one, you’re doing a search.
A search always finds something that looks “special”.
That’s not bad — it’s just how randomness behaves when you keep asking it questions.
So treat “best mapping found” as: - a fun diagnostic, - a warning about cherry-picking, - and a reason to do honest backtests (time-splits, forward tests, fresh draws).
➡️ Download the script: mincovering_random_mapping_sim.py
If you want, next steps are: - we can add a time-split version (train on older draws, score on newer draws) - and we can track whether the “best mapping” keeps its shine out-of-sample. So Ask for it.