EuroJackpot PyLab

Coding the lottery. Keeping it human.

Heavy-lift x-pattern filter (ranked search)

2025-12-26

Why this post exists

If you’ve ever stared at a lottery draw and thought “there must be some structure hiding in here…”, you’re not alone. The annoying part is that most “systems” jump straight to predicting numbers, and that’s where things get messy fast.

This post takes a different angle:

Don’t start with “the next numbers”.
Start with filters that carve the full space into a smaller region that still looks like it contains real historical draws.

That’s the whole mood.

Instead of betting on a single fragile idea, we try to find reasonable constraints — a kind of “shape” of the next draw — then we let that shape guide the rest of the workflow (ranking numbers, reducing combinations, building packs, etc.).

The cast of characters: tot_df and hist_df

We use two datasets side by side:

  • tot_df: the big space of valid combinations (or a large engineered subset of it).
    This is where “how much space did we cut?” is measured.

  • hist_df: the actual historical draws, engineered with the same feature columns.
    This is where “did we kill the history?” is measured.

The trick is: every time we filter one, we filter the other in lockstep with the same rule. Otherwise you end up comparing apples to spaceships.

What are “x-columns” anyway?

Your x1 … x20 columns are feature-like signals derived from the draw or from feature engineering. The details can differ depending on how you built them, but from the search perspective we treat them as columns that often take values like 0/1 (sometimes more, but for this post we focus on binary patterns).

An example “pattern filter” looks like:

  • columns: x7, x9, x12, x17, x19
  • pattern: [1, 1, 1, 0, 0]

Meaning we keep only rows where:

  • x7==1 AND x9==1 AND x12==1 AND x17==0 AND x19==0

We apply this rule to both datasets:

  • filter the space (tot_df)
  • filter the history (hist_df)

Now we can measure whether this pattern is doing something interesting.

Heavy lift (but measured the right way)

If I only tell you “this filter shrinks tot_df by 98%”, that sounds impressive… but it’s not enough.

Because a filter can shrink the space by 98% and also shrink the history by 99.9%. That’s not a “smart” filter — it’s just a meat grinder.

So we track two ratios:

  • df_ratio = len(tot_f) / len(tot)
  • df_n_ratio = len(hist_f) / len(hist)

And then we compare them:

  • ratio_of_ratios = df_n_ratio / df_ratio

Interpretation (roughly):

  • ratio_of_ratios > 1
    history survives better than the space does
    → good sign: the filter isn’t just deleting everything randomly

  • ratio_of_ratios < 1
    history collapses faster than the space
    → warning: it might be too harsh or too “unrealistic”

This isn’t proof of predictability. It’s a sanity check that the filter is not pure fantasy.

“Due-ness”: the delay-percent idea (the polite version)

Next we ask a second question:

After applying the filter, does the filtered subset look “late” compared to its own past rhythm?

That’s where the delay-percent diagnostics come in.

The filtered subset produces a sequence of “hit” timestamps inside the historical timeline. From that we get a distribution of gaps (intervals), and then we ask:

  • how many draws since the last “hit” of this filtered subset?
  • where does that gap fall compared to the subset’s own gap history? (P90, P95, P99, etc.)

The output is a percentile-like score (pct_score) that says: - low score: “this happened recently, not really due” - high score: “this hasn’t happened in a while relative to its own history”

Again: not magic. But it’s a useful ranking signal.

Why we need a penalty (tiny subsets are liars)

Here’s the classic trap:

You find a pattern that occurred 8 times in 10+ years.
It might look insanely “due”. It might even have a great ratio_of_ratios.
But it’s also extremely easy for tiny samples to look good by luck.

So we add a support penalty — a factor that pushes the score toward zero when the filtered history subset has too few hits (hit_count is small).

Conceptually:

  • small hit_count → the pattern might be “cute” but fragile
  • big hit_count → the pattern has enough evidence to be taken seriously

In other words: we don’t let a unicorn run the whole lab.

The ranking score (simple on purpose)

We combine these ideas into one score:

  • lift: ratio_of_ratios
  • due: pct_score / 100
  • support penalty: support_factor(hit_count)

So the ranking score is:

```text score = ratio_of_ratios * (pct_score/100) * support_factor(hit_count)

It’s not the only score you could use. It’s just a practical one that behaves in the direction we want:

  • reward strong “space cut”
  • reward patterns that preserve history relative to the cut
  • reward “late” subsets
  • punish tiny samples

What to do with the winner

Once you have a top pattern (columns + pattern values), you can:

  • Filter tot_df into a reduced pool region
  • Filter hist_df into the comparable historical subset
  • Use the filtered history to:
  • rank numbers (percentiles across st1..st5)
  • test additional features
  • build reduced packs for play
  • compare performance vs unfiltered baselines

It’s basically a “zoom lens”: you don’t claim to see the future — you claim you’re focusing on a region that seems to behave like real draws.

Download the code

It generates a ranked candidates table and a markdown snippet (tables + diagnostics). But have to warn that as it is the script takes a lot of time to run.

Final reality check (because we’re not selling fairy tales)

This method is about structured reduction and ranking. It tries to be honest with two hard truths:

  • the lottery is designed to be random
  • humans (and models) love to hallucinate patterns

So the goal isn’t certainty. The goal is a workflow that gives you:

  • fewer random guesses
  • more measurable decisions
  • and a clean way to test whether a filter is “smart” or just “violent”

If nothing else, it’s a much better conversation with your data than picking birthdays and hoping the universe vibes with you.

Heavy-lift x-pattern filter for the next draw (ranked search)

Candidate

  • Columns: ['x5', 'x6', 'x9', 'x11', 'x19']
  • Pattern: [0, 0, 0, 1, 1]

Lockstep reduction

  • Space rows: 5850 / 1221759 (df_ratio=0.004788)
  • Hist rows: 73 / 3225 (df_n_ratio=0.022636)
  • ratio_of_ratios: 4.7274

Due summary on the filtered history

label hit_count draws_since_last median P75 P90 P95 P99 max pct_score
best_filter 73 136 28 66 101 117 151 192 98.6486

Heavy-lift x-pattern filters (ranked search)

Best x-pattern candidates (ranked)

k cols pattern ratio_of_ratios df_ratio df_n_ratio space_rows hist_rows pct_score draws_since_last hit_count
5 [x5, x6, x9, x11, x19] [0, 0, 0, 1, 1] 4.72741 0.0048 0.0226 5850 73 98.6 136 73
5 [x5, x13, x18, x19, x20] [1, 0, 0, 1, 0] 4.72741 0.0048 0.0226 5850 73 89.2 96 73
5 [x2, x3, x13, x14, x17] [0, 0, 1, 1, 0] 5.11596 0.0048 0.0245 5850 79 78.1 62 79
6 [x3, x9, x12, x15, x16, x18] [0, 0, 0, 0, 0, 0] 5.87729 0.0051 0.0298 6188 96 64.9 36 96
6 [x9, x12, x15, x16, x18, x19] [0, 0, 0, 0, 0, 0] 5.69362 0.0051 0.0288 6188 93 67.0 36 93
6 [x3, x8, x9, x15, x16, x19] [1, 0, 0, 0, 0, 0] 4.17768 0.0049 0.0205 5985 66 100.0 398 66
5 [x3, x5, x11, x19, x20] [1, 0, 1, 1, 0] 4.49963 0.0043 0.0192 5220 62 93.7 154 62
5 [x2, x10, x13, x15, x19] [0, 1, 1, 0, 0] 4.14457 0.0048 0.0198 5850 64 100.0 254 64
6 [x5, x6, x9, x10, x16, x18] [0, 0, 0, 0, 0, 0] 4.59163 0.0051 0.0233 6188 75 85.5 76 75
5 [x1, x5, x10, x11, x17] [0, 1, 1, 0, 0] 4.66265 0.0048 0.0223 5850 72 84.9 89 72
5 [x1, x6, x10, x16, x20] [1, 0, 0, 0, 0] 3.74348 0.0087 0.0326 10626 105 95.3 83 105
5 [x3, x5, x10, x11, x19] [0, 1, 1, 0, 0] 3.88793 0.0060 0.0233 7308 75 98.7 170 75
6 [x1, x2, x5, x8, x9, x18] [0, 0, 0, 0, 0, 0] 3.97941 0.0070 0.0279 8568 90 91.2 108 90
5 [x1, x3, x9, x13, x19] [0, 1, 0, 1, 0] 3.98962 0.0054 0.0214 6552 69 97.1 173 69
5 [x1, x2, x7, x10, x13] [1, 0, 0, 1, 0] 3.93179 0.0054 0.0211 6552 68 98.6 154 68
6 [x1, x2, x4, x8, x17, x18] [0, 0, 0, 0, 0, 0] 3.80255 0.0070 0.0267 8568 86 95.4 108 86
6 [x2, x5, x6, x8, x9, x18] [0, 0, 0, 0, 0, 0] 4.22430 0.0051 0.0214 6188 69 90.7 108 69
5 [x2, x9, x12, x19, x20] [1, 0, 0, 0, 1] 3.62650 0.0072 0.0260 8775 84 100.0 166 84
5 [x4, x5, x6, x9, x19] [1, 0, 0, 0, 1] 4.46837 0.0048 0.0214 5850 69 85.7 72 69
6 [x3, x4, x10, x12, x14, x19] [1, 0, 0, 0, 0, 0] 4.17768 0.0049 0.0205 5985 66 92.5 139 66

Notes

  • This ranks pattern-filters, not outcomes.
  • ratio_of_ratios > 1 means the filter keeps history alive better than random shrinking.
  • pct_score close to 100 means the subset looks late relative to its own gaps.
  • support_factor pushes down tiny-hit patterns so we don’t get hypnotized by noise.
  • This is not a promise of outcomes. It's a ranking method for pattern-filters.
  • support_factor penalizes tiny subsets so we don't get fooled by 7 or 9 historical hits.