Why this post exists

If you’ve ever stared at a lottery draw and thought “there must be some structure hiding in here…”, you’re not alone. The annoying part is that most “systems” jump straight to predicting numbers, and that’s where things get messy fast.

This post takes a different angle:

Don’t start with “the next numbers”.
Start with filters that carve the full space into a smaller region that still looks like it contains real historical draws.

That’s the whole mood.

Instead of betting on a single fragile idea, we try to find reasonable constraints — a kind of “shape” of the next draw — then we let that shape guide the rest of the workflow (ranking numbers, reducing combinations, building packs, etc.).

The cast of characters: tot_df and hist_df

We use two datasets side by side:

tot_df: the big space of valid combinations (or a large engineered subset of it).
This is where “how much space did we cut?” is measured.
hist_df: the actual historical draws, engineered with the same feature columns.
This is where “did we kill the history?” is measured.

The trick is: every time we filter one, we filter the other in lockstep with the same rule. Otherwise you end up comparing apples to spaceships.

What are “x-columns” anyway?

Your x1 … x20 columns are feature-like signals derived from the draw or from feature engineering. The details can differ depending on how you built them, but from the search perspective we treat them as columns that often take values like 0/1 (sometimes more, but for this post we focus on binary patterns).

An example “pattern filter” looks like:

columns: x7, x9, x12, x17, x19
pattern: [1, 1, 1, 0, 0]

Meaning we keep only rows where:

x7==1 AND x9==1 AND x12==1 AND x17==0 AND x19==0

We apply this rule to both datasets:

filter the space (tot_df)
filter the history (hist_df)

Now we can measure whether this pattern is doing something interesting.

Heavy lift (but measured the right way)

If I only tell you “this filter shrinks tot_df by 98%”, that sounds impressive… but it’s not enough.

Because a filter can shrink the space by 98% and also shrink the history by 99.9%. That’s not a “smart” filter — it’s just a meat grinder.

So we track two ratios:

df_ratio = len(tot_f) / len(tot)
df_n_ratio = len(hist_f) / len(hist)

And then we compare them:

ratio_of_ratios = df_n_ratio / df_ratio

Interpretation (roughly):

ratio_of_ratios > 1
history survives better than the space does
→ good sign: the filter isn’t just deleting everything randomly
ratio_of_ratios < 1
history collapses faster than the space
→ warning: it might be too harsh or too “unrealistic”

This isn’t proof of predictability. It’s a sanity check that the filter is not pure fantasy.

“Due-ness”: the delay-percent idea (the polite version)

Next we ask a second question:

After applying the filter, does the filtered subset look “late” compared to its own past rhythm?

That’s where the delay-percent diagnostics come in.

The filtered subset produces a sequence of “hit” timestamps inside the historical timeline. From that we get a distribution of gaps (intervals), and then we ask:

how many draws since the last “hit” of this filtered subset?
where does that gap fall compared to the subset’s own gap history? (P90, P95, P99, etc.)

The output is a percentile-like score (pct_score) that says: - low score: “this happened recently, not really due” - high score: “this hasn’t happened in a while relative to its own history”

Again: not magic. But it’s a useful ranking signal.

Why we need a penalty (tiny subsets are liars)

Here’s the classic trap:

You find a pattern that occurred 8 times in 10+ years.
It might look insanely “due”. It might even have a great ratio_of_ratios.
But it’s also extremely easy for tiny samples to look good by luck.

So we add a support penalty — a factor that pushes the score toward zero when the filtered history subset has too few hits (hit_count is small).

Conceptually:

small hit_count → the pattern might be “cute” but fragile
big hit_count → the pattern has enough evidence to be taken seriously

In other words: we don’t let a unicorn run the whole lab.

The ranking score (simple on purpose)

We combine these ideas into one score:

lift: ratio_of_ratios
due: pct_score / 100
support penalty: support_factor(hit_count)

So the ranking score is:

```text score = ratio_of_ratios * (pct_score/100) * support_factor(hit_count)

It’s not the only score you could use. It’s just a practical one that behaves in the direction we want:

reward strong “space cut”
reward patterns that preserve history relative to the cut
reward “late” subsets
punish tiny samples

What to do with the winner

Once you have a top pattern (columns + pattern values), you can:

Filter tot_df into a reduced pool region
Filter hist_df into the comparable historical subset
Use the filtered history to:
rank numbers (percentiles across st1..st5)
test additional features
build reduced packs for play
compare performance vs unfiltered baselines

It’s basically a “zoom lens”: you don’t claim to see the future — you claim you’re focusing on a region that seems to behave like real draws.

Download the code

✅ Code: search_xpattern_lift_due.py

It generates a ranked candidates table and a markdown snippet (tables + diagnostics). But have to warn that as it is the script takes a lot of time to run.

Final reality check (because we’re not selling fairy tales)

This method is about structured reduction and ranking. It tries to be honest with two hard truths:

the lottery is designed to be random
humans (and models) love to hallucinate patterns

So the goal isn’t certainty. The goal is a workflow that gives you:

fewer random guesses
more measurable decisions
and a clean way to test whether a filter is “smart” or just “violent”

If nothing else, it’s a much better conversation with your data than picking birthdays and hoping the universe vibes with you.

Heavy-lift x-pattern filter for the next draw (ranked search)

Candidate

Columns: ['x5', 'x6', 'x9', 'x11', 'x19']
Pattern: [0, 0, 0, 1, 1]

Lockstep reduction

Space rows: 5850 / 1221759 (df_ratio=0.004788)
Hist rows: 73 / 3225 (df_n_ratio=0.022636)
ratio_of_ratios: 4.7274

Due summary on the filtered history

label	hit_count	draws_since_last	median	P75	P90	P95	P99	max	pct_score
best_filter	73	136	28	66	101	117	151	192	98.6486

Heavy-lift x-pattern filters (ranked search)

Best x-pattern candidates (ranked)

k	cols	pattern	ratio_of_ratios	df_ratio	df_n_ratio	space_rows	hist_rows	pct_score	draws_since_last	hit_count
5	[x5, x6, x9, x11, x19]	[0, 0, 0, 1, 1]	4.72741	0.0048	0.0226	5850	73	98.6	136	73
5	[x5, x13, x18, x19, x20]	[1, 0, 0, 1, 0]	4.72741	0.0048	0.0226	5850	73	89.2	96	73
5	[x2, x3, x13, x14, x17]	[0, 0, 1, 1, 0]	5.11596	0.0048	0.0245	5850	79	78.1	62	79
6	[x3, x9, x12, x15, x16, x18]	[0, 0, 0, 0, 0, 0]	5.87729	0.0051	0.0298	6188	96	64.9	36	96
6	[x9, x12, x15, x16, x18, x19]	[0, 0, 0, 0, 0, 0]	5.69362	0.0051	0.0288	6188	93	67.0	36	93
6	[x3, x8, x9, x15, x16, x19]	[1, 0, 0, 0, 0, 0]	4.17768	0.0049	0.0205	5985	66	100.0	398	66
5	[x3, x5, x11, x19, x20]	[1, 0, 1, 1, 0]	4.49963	0.0043	0.0192	5220	62	93.7	154	62
5	[x2, x10, x13, x15, x19]	[0, 1, 1, 0, 0]	4.14457	0.0048	0.0198	5850	64	100.0	254	64
6	[x5, x6, x9, x10, x16, x18]	[0, 0, 0, 0, 0, 0]	4.59163	0.0051	0.0233	6188	75	85.5	76	75
5	[x1, x5, x10, x11, x17]	[0, 1, 1, 0, 0]	4.66265	0.0048	0.0223	5850	72	84.9	89	72
5	[x1, x6, x10, x16, x20]	[1, 0, 0, 0, 0]	3.74348	0.0087	0.0326	10626	105	95.3	83	105
5	[x3, x5, x10, x11, x19]	[0, 1, 1, 0, 0]	3.88793	0.0060	0.0233	7308	75	98.7	170	75
6	[x1, x2, x5, x8, x9, x18]	[0, 0, 0, 0, 0, 0]	3.97941	0.0070	0.0279	8568	90	91.2	108	90
5	[x1, x3, x9, x13, x19]	[0, 1, 0, 1, 0]	3.98962	0.0054	0.0214	6552	69	97.1	173	69
5	[x1, x2, x7, x10, x13]	[1, 0, 0, 1, 0]	3.93179	0.0054	0.0211	6552	68	98.6	154	68
6	[x1, x2, x4, x8, x17, x18]	[0, 0, 0, 0, 0, 0]	3.80255	0.0070	0.0267	8568	86	95.4	108	86
6	[x2, x5, x6, x8, x9, x18]	[0, 0, 0, 0, 0, 0]	4.22430	0.0051	0.0214	6188	69	90.7	108	69
5	[x2, x9, x12, x19, x20]	[1, 0, 0, 0, 1]	3.62650	0.0072	0.0260	8775	84	100.0	166	84
5	[x4, x5, x6, x9, x19]	[1, 0, 0, 0, 1]	4.46837	0.0048	0.0214	5850	69	85.7	72	69
6	[x3, x4, x10, x12, x14, x19]	[1, 0, 0, 0, 0, 0]	4.17768	0.0049	0.0205	5985	66	92.5	139	66

Notes

This ranks pattern-filters, not outcomes.
ratio_of_ratios > 1 means the filter keeps history alive better than random shrinking.
pct_score close to 100 means the subset looks late relative to its own gaps.
support_factor pushes down tiny-hit patterns so we don’t get hypnotized by noise.
This is not a promise of outcomes. It's a ranking method for pattern-filters.
support_factor penalizes tiny subsets so we don't get fooled by 7 or 9 historical hits.

EuroJackpot PyLab

Heavy-lift x-pattern filter (ranked search)

Why this post exists

The cast of characters: tot_df and hist_df

What are “x-columns” anyway?

Heavy lift (but measured the right way)

“Due-ness”: the delay-percent idea (the polite version)

Why we need a penalty (tiny subsets are liars)

The ranking score (simple on purpose)

What to do with the winner

Download the code

Final reality check (because we’re not selling fairy tales)

Heavy-lift x-pattern filter for the next draw (ranked search)

Candidate

Lockstep reduction

Due summary on the filtered history

Heavy-lift x-pattern filters (ranked search)

Best x-pattern candidates (ranked)

Notes