Lab Report — Percentile Ranking Across Columns (Bucket Test Results)
This report documents a simple question with a very “PyLab” twist:
If we rank EuroJackpot numbers (1–50) using their spacing/delay behavior across
st1..st5, then split that ranked list into buckets, do the next draw’s numbers land in some buckets more than we’d expect by pure randomness?
No promises. Just a method, a backtest, and clean interpretation.
1) What calculate_percentiles_multi_columns() does
The function takes a DataFrame and a list of columns that share the same value domain.
Classic examples:
- Main numbers:
['st1','st2','st3','st4','st5'](values 1..50) - Differences:
['D1','D2','D3','D4'] - Due/interval features:
['k1','k2','k3','k4','k5']or['K1','K2','K3','K4','K5']
The key idea
Instead of treating st1, st2, … as separate columns, we treat them as one merged value series:
- “Did value v appear in any of these columns on this draw?”
- If yes, record that draw position as a hit for v.
Then for each value v, we compute its gap sequence:
- gap from start → first hit
- gaps between consecutive hits
- gap from last hit → end of history (the current delay)
From the gap sequence we calculate:
- frequency proxy (
co): how many intervals exist - current delay (
delay): the last gap - percentiles of gaps (median, P75, P90, P95, P99)
Pct_score: how “late” the current delay is, compared to that value’s own past gaps
Finally we blend frequency and “lateness” into one sortable score:
Norm: normalized frequency shareProd = Norm * Pct_score
Higher Prod → higher rank.
2) Turning a ranked list into bucket patterns
After ranking, we have an ordered list of 50 numbers (top-ranked first).
We tested three ways to slice it:
A) 3 buckets (thirds)
For N=50 we used:
- G1: 16 numbers
- G2: 17 numbers
- G3: 17 numbers
B) 2 buckets (halves)
- H1: 25 numbers
- H2: 25 numbers
C) Middle 68% bucket
- Keep the middle 34 numbers
- Drop 8 from the top + 8 from the bottom (extremes of the ranking)
For each next draw, we count how many of the 5 winning numbers fall inside each bucket.
3) Backtest design (walk-forward, no peeking)
For each time step t:
- Build the ranking from draws
0..t - Create the buckets from that ranking
- Look at draw
t+1and count bucket hits - Move to
t+1and repeat
We started at draw index 50 so the ranking has enough history.
Total steps in this run: 864
4) How to read the summary columns
The summary you see below is the “bucket performance” condensed over all 864 steps.
Here is what each column means:
-
label
Which bucket we’re scoring (example:2bucket_H2,mid68,3bucket_G1). -
n_steps
How many prediction steps were tested (here: 864 next-draw evaluations). -
avg_hits
Average count of next-draw numbers (out of 5) that landed in that bucket. -
expected_hits
What we would expect from a random draw if the bucket has size s out of 50:
expected_hits = 5 * (s / 50)
Example: a 25-number half → expected = 5*(25/50)=2.5. -
ratio_vs_expected
avg_hits / expected_hits
Above 1.00 means “a bit more than random baseline”.
Below 1.00 means “a bit less than baseline”. -
pct_ge3
Percentage of steps where the bucket captured 3 or more of the next draw’s 5 numbers. -
pct_ge4
Percentage of steps where the bucket captured 4 or more numbers. -
pct_eq5
Percentage of steps where the bucket captured all 5 numbers. -
pct_eq0
Percentage of steps where the bucket captured zero numbers.
5) Your results (summary table)
label n_steps avg_hits expected_hits ratio_vs_expected pct_ge3 pct_ge4 pct_eq5 pct_eq0
2bucket_H2 864 2.576389 2.5 1.030556 52.314815 19.212963 3.356481 1.736111
3bucket_G2 864 1.731481 1.7 1.018519 21.064815 4.745370 0.115741 9.837963
3bucket_G3 864 1.710648 1.7 1.006264 20.717593 4.282407 0.231481 10.532407
mid68 864 3.414352 3.4 1.004221 83.333333 47.106481 14.583333 0.000000
edge16 864 1.585648 1.6 0.991030 16.666667 3.587963 0.000000 14.583333
3bucket_G1 864 1.557870 1.6 0.973669 15.277778 3.472222 0.231481 13.888889
2bucket_H1 864 2.423611 2.5 0.969444 47.685185 15.509259 1.736111 3.356481
6) Interpretation (what this is really telling us)
6.1 The “big picture”
Most ratios are very close to 1.00 (baseline).
That’s a polite way of saying: the bucket trick is not creating a huge edge by itself.
That’s normal. The lottery is harsh like that.
Still, there are a few signals worth noticing.
6.2 Two-bucket split: H2 slightly wins over H1
2bucket_H2ratio: 1.03062bucket_H1ratio: 0.9694
This means that the half labeled H2 captured slightly more winning numbers than baseline, while H1 captured slightly fewer.
In plain terms:
- if you must choose one half to “favor”, this run suggests H2 is the better half.
Still, the gap is small: about 0.0768 extra hits per draw on average (2.576 − 2.500).
Over 864 steps, that can show up as a stable nudge, or it can be a long, mild wave of randomness.
Practical takeaway:
Use the half-split as a light bias, not as a standalone rule.
6.3 Three-bucket split: the middle-ish buckets look slightly better
G2ratio: 1.0185G3ratio: 1.0063G1ratio: 0.9737
Same story: small differences, yet the direction is consistent with the two-bucket finding: one side (or middle bands) tends to catch a little more than the other.
Practical takeaway:
If you like thirds, focus on G2 first, then G3.
6.4 Middle 68%: great hit counts, but that’s baked in
mid68 has:
- avg_hits 3.41 out of 5
- pct_ge3 83.33%
- pct_ge4 47.11%
- pct_eq5 14.58%
That looks impressive until you remember: the bucket has 34 numbers out of 50.
Baseline for mid68 is already:
- expected_hits = 3.4
So the ratio is 1.004 → basically baseline.
So what is mid68 good for?
It behaves like a stability filter:
- it rarely misses everything (pct_eq0 = 0% in your run)
- it’s not selective enough to cut the search space hard (34/50 is still huge)
Practical takeaway:
Use mid68 as a soft guardrail (“avoid extreme-ranked edges”), not as a primary reducer.
6.5 Edge 16: slightly below baseline
edge16 ratio: 0.991
This says: the extreme-ranked numbers (top 8 + bottom 8) were hit a tiny bit less than baseline.
Again: tiny effect.
Practical takeaway:
If you want a simple rule that feels sane:
- avoid leaning too hard into the extreme ends of the ranked list,
- unless another feature strongly supports those values.
7) The “combinatorial” view (this is the part that sounds crazy)
You can point out something really important: pct_eq5 is not just a cute metric. It has a direct reduced-combination meaning.
If our bucket contains s numbers out of 50, then the number of 5-number combinations inside that bucket is:
C(s,5) combinations
and the total universe of possible 5-number combos is:
C(50,5) = 2,118,760
So the “space coverage” fraction is:
C(s,5) / C(50,5)
7.1 Mid68 (34 numbers)
Bucket size: 34
Combinations inside: 278,256
Total combinations: 2,118,760
Coverage fraction: 278256 / 2118760 = 13.133%
In our backtest:
pct_eq5(mid68) = 14.583% That’s 126 hits out of 864 steps.
So yes, it’s fair to say:
A set that covers 13.13% of the combination universe ended up containing the true 5-number draw 14.58% of the time.
That’s a relative lift of about:
14.583 / 13.133 ≈ 1.11× (around +11%)
Sounds juicy… and it is worth mentioning.
The catch: with 864 trials, that lift is not yet “slam dunk” statistically. A quick binomial sanity check puts it around ~1.3 standard deviations above baseline. In human terms: interesting, not proven.
7.2 Two-bucket H2 (25 numbers)
Bucket size: 25
Combinations inside: 53,130
Total combinations: 2,118,760
Coverage fraction: 53130 / 2118760 = 2.508%
In your backtest:
pct_eq5(H2) = 3.356% That’s 29 hits out of 864.
Relative lift:
3.356 / 2.508 ≈ 1.34× (around +34%)
This is the more exciting story, because it’s a much tighter slice of the universe (2.5% of combos) that still captured 3.36% of true draws.
Same catch as above: the sample count is small (29 events), so the uncertainty is big. A quick sanity check puts it around ~1.6 standard deviations above baseline. That’s not “case closed”, but it’s not nothing either.
8) What I would do with these results (in the PyLab style)
If you want one “best” option from this test:
1) Prefer the half bucket that wins (H2) as a mild bias layer
2) Treat mid68 as a gentle “don’t go crazy” filter
3) Combine the bucket rule with your real reducers:
- gap filters / valid-gap zones
- feature bands (sum/range/overlaps)
- grid-pattern flags
- covering sets + random mappings
A bucket rule alone won’t cut the space enough to matter.
A bucket rule stacked with 3–6 other independent-ish constraints can become useful.
9) Next step (to confirm it’s not just noise)
Two quick checks that usually reveal the truth fast:
-
Rolling windows: run the same summary on last 200, 400, 600 steps
DoesH2stay on top, or does it flip back and forth? -
Swap ordering: define buckets from the bottom-up ranking too
If the “better half” always ends up being “the second half”, you may be seeing a stable bias in how the ranking orders values.
Paste a rolling-window summary next time and we’ll see if this nudge stays alive.
Responsible play note
This is structure, not a guarantee.
Keep it small, keep it fun, and don’t let a good run mess with your limits.