Skip to content

pre-compute password denylist Bloom filter to speed up server startup#48515

Merged
ahus1 merged 4 commits intokeycloak:mainfrom
Nordix:feature/password-denylist-bloom-precompute
May 7, 2026
Merged

pre-compute password denylist Bloom filter to speed up server startup#48515
ahus1 merged 4 commits intokeycloak:mainfrom
Nordix:feature/password-denylist-bloom-precompute

Conversation

@kfaseela
Copy link
Copy Markdown
Contributor

@kfaseela kfaseela commented Apr 27, 2026

Fixes #47356

Loading large password denylist files from plaintext on every server startup or reload rebuilds the Bloom filter from scratch, which can take several seconds for multi-million-line lists.

This PR introduces a pre-computed binary (.bloom) file that the server loads
instead of rebuilding from plaintext, significantly reducing startup and reload time.

Performance (manually verified): loading a 14M-entry denylist (~133 MB plaintext) dropped from ~2-3 seconds to ~2 ms after pre-computing the .bloom file. Given this improvement, background/async loading does not appear necessary at this time.

Changes

  • New kc.sh tools build-password-denylist command generates a .bloom file from a plaintext denylist.
  • The server detects the file type by extension: if the configured filename ends with .bloom, it is loaded as a pre-computed Bloom filter binary; otherwise it is read as plaintext.
  • Change detection and hot-reload work the same way for both file types - the configured file is watched for mtime/size changes.

Usage

# Pre-compute the Bloom filter
kc.sh tools build-password-denylist /path/to/denylist

This generates denylist.bloom in the same directory (Optionally the CLI can accept the custom .bloom file name as argument as well, if needed). Place both files in the password-blacklists folder and start the server as usual: kc.sh start --spi-password-policy-password-blacklist-blacklists-path=/path/to/password-denylists

Then in the Admin UI set the password blacklist policy value to /path/to/denylist/denylist.bloom

Admins with small denylists can continue using .txt files as before — no changes needed. The .bloom format is an opt-in performance optimization for large denylists.

Naming note

New code intentionally uses denylist instead of blacklist. Existing identifiers (class names, SPI config keys, folder names) are left unchanged to avoid a breaking change - a separate issue can be raised to track that cleanup.

@kfaseela kfaseela requested review from a team as code owners April 27, 2026 16:16
@kfaseela kfaseela force-pushed the feature/password-denylist-bloom-precompute branch from cf96091 to 58a49dc Compare April 27, 2026 16:25
@kfaseela kfaseela force-pushed the feature/password-denylist-bloom-precompute branch from 04a5e1c to 057864a Compare April 28, 2026 10:42
@ahus1 ahus1 requested a review from vmuzikar April 28, 2026 16:11
@ahus1
Copy link
Copy Markdown
Member

ahus1 commented Apr 28, 2026

@vmuzikar - This change adds a new "tools" command. I consider the naming well-picked, and I'm overall happy with the change.

If you agree with the naming of the command, I'll review the rest of the pull request will then finally approve it. Thanks!

@ahus1 ahus1 self-assigned this Apr 28, 2026
@kfaseela kfaseela force-pushed the feature/password-denylist-bloom-precompute branch 2 times, most recently from b59ccc6 to 5841b23 Compare May 4, 2026 13:28
@shawkins
Copy link
Copy Markdown
Contributor

shawkins commented May 4, 2026

On startup and reload, the server automatically picks up the .bloom file when present, falling back to plaintext if the file is missing or corrupt.

Does the server need to have some understanding of what denylist state corresponds to a given .bloom state? That is should it continue to use the .bloom file even if the .text file has changed?

To prevent this ambiguity, should the changes here require the user to instead configure the provider to use the .bloom file, rather than the .text file? Then simply fail if the .bloom file does not exist?

Change detection watches the .bloom file when present, so the server reloads correctly when the denylist is updated and re-precomputed.

It's probably out-of-scope for this pr / issue, but I'm wondering If not present, and if we have a good way to ensure that the .bloom file is up-to-date, should the server be creating a .bloom file in our temp directory for reuse?

New kc.sh tools build-password-denylist command generates a .bloom file alongside the plaintext denylist.

This is straight-forward with a single node, but for use in a cluster are we going to show or recommend using a shared volume / job to do this?

@kfaseela
Copy link
Copy Markdown
Contributor Author

kfaseela commented May 4, 2026

On startup and reload, the server automatically picks up the .bloom file when present, falling back to plaintext if the file is missing or corrupt.

Does the server need to have some understanding of what denylist state corresponds to a given .bloom state? That is should it continue to use the .bloom file even if the .text file has changed?

To prevent this ambiguity, should the changes here require the user to instead configure the provider to use the .bloom file, rather than the .text file? Then simply fail if the .bloom file does not exist?

Change detection watches the .bloom file when present, so the server reloads correctly when the denylist is updated and re-precomputed.

It's probably out-of-scope for this pr / issue, but I'm wondering If not present, and if we have a good way to ensure that the .bloom file is up-to-date, should the server be creating a .bloom file in our temp directory for reuse?

New kc.sh tools build-password-denylist command generates a .bloom file alongside the plaintext denylist.

This is straight-forward with a single node, but for use in a cluster are we going to show or recommend using a shared volume / job to do this?

Thanks for the review @shawkins! Let me know if the below points for your three questions above makes sense.

  1. Stale .bloom detection
    Good catch - I can add a warning log at load time when the .txt is newer than the .bloom, so users are alerted to regenerate it. Would that be an agreeable solution?

  2. Auto-detect vs explicit config
    I intentionally kept the auto-detect approach to avoid a breaking change, existing users need zero reconfiguration. The .bloom file is purely an opt-in performance optimization. Users who don't generate one are unaffected. Requiring explicit config would expose this complexity to everyone, even those with small denylists who never need it. WDYT?

  3. Clustering
    The .bloom file follows the exact same deployment model as the existing plaintext file, there's no cluster-specific handling today either. Each node independently loads from its configured path. If operators already have a shared volume or image-baked file for the .txt, they simply do the same for .bloom. No new clustering complexity is introduced. Correct me if I'm wrong!

@shawkins
Copy link
Copy Markdown
Contributor

shawkins commented May 4, 2026

  1. Seems reasonable given your answer to the second point.

  2. I was thinking along the lines of a boolean option that would instruct the provider to expect .bloom files instead. But if you want to be able to support mixed scenarios, where some denylists have .bloom files while others are just the .txt files, then that doesn't work.

  3. You are correct that this is not really a new concern - let's see if this could be a production readiness team follow-up Ideally we'd address keycloak operator usage as well. Although without proper mount support we may not want to show how this yet. cc @ahus1 @vmuzikar

@tsaarni
Copy link
Copy Markdown
Contributor

tsaarni commented May 5, 2026

Would it be simpler if we just let the user configure any filename? If the file ends with .bloom, we load it as a serialized bloom filter. Otherwise, we read it as a text file. That way the user has no doubt about what file ended up being loaded, or monitored for updates. Realm admins don't need to look into Keycloak logs for warnings about discrepancies, and there's no need to compare modification dates.

It starts to feel to me that creating the association around basename (denylist.{txt,bloom}) just brings complications with no big gains over treating all files as independent and letting the user load two different file formats.

@kfaseela
Copy link
Copy Markdown
Contributor Author

kfaseela commented May 5, 2026

Would it be simpler if we just let the user configure any filename? If the file ends with .bloom, we load it as a serialized bloom filter. Otherwise, we read it as a text file. That way the user has no doubt about what file ended up being loaded, or monitored for updates. Realm admins don't need to look into Keycloak logs for warnings about discrepancies, and there's no need to compare modification dates.

It starts to feel to me that creating the association around basename (denylist.{txt,bloom}) just brings complications with no big gains over treating all files as independent and letting the user load two different file formats.

@shawkins : This idea looks more straightforward and simpler to me.. WDYT?

@shawkins
Copy link
Copy Markdown
Contributor

shawkins commented May 5, 2026

@shawkins : This idea looks more straightforward and simpler to me.. WDYT?

It's good with me, but it seemed like a requirement here was not to require realm level changes.

@kfaseela
Copy link
Copy Markdown
Contributor Author

kfaseela commented May 5, 2026

@shawkins : This idea looks more straightforward and simpler to me.. WDYT?

It's good with me, but it seemed like a requirement here was not to require realm level changes.

@shawkins the explicit file type approach matches my requirement perfectly, I was looking for flexibility to opt in, and that's exactly what this gives. Admins who need the performance benefit simply configure denylist.bloom, others continue with denylist.txt as before. If you are okay with the approach, I can update the PR accordingly. It feels like being explicit about what is needed, avoids a lot of confusions. People who continue to use the current approach anyways do not need any realm level changes. But yes, others need to specify the filename, but isn't that better than the ambiguity?

@shawkins
Copy link
Copy Markdown
Contributor

shawkins commented May 5, 2026

If you are okay with the approach, I can update the PR accordingly

No objection from me, I'd prefer to be explicit.

kfaseela added 4 commits May 6, 2026 13:01
Signed-off-by: Faseela K <faseela.k@est.tech>
Signed-off-by: Faseela K <faseela.k@est.tech>
Signed-off-by: Faseela K <faseela.k@est.tech>
Signed-off-by: Faseela K <faseela.k@est.tech>
@kfaseela kfaseela force-pushed the feature/password-denylist-bloom-precompute branch from 5841b23 to bf80737 Compare May 6, 2026 11:01
@kfaseela
Copy link
Copy Markdown
Contributor Author

kfaseela commented May 6, 2026

If you are okay with the approach, I can update the PR accordingly

No objection from me, I'd prefer to be explicit.

@shawkins @tsaarni : I have made the changes as we have discussed so far, let me know if this looks okay now! -- dropped the basename association so .bloom and plaintext are fully independent files (server checks extension only, no cross-file watching), and added -o/--output to build-password-denylist CLI for a custom output path and bloom file name.

@kfaseela
Copy link
Copy Markdown
Contributor Author

kfaseela commented May 6, 2026

The External links check has been continuously failing whenever it is run, and the error is unrelated to the current changes. I pushed this separate PR #48682 to add the problematic link to ignored links. @shawkins in case that makes sense, feel free to review

@vmuzikar
Copy link
Copy Markdown
Contributor

vmuzikar commented May 6, 2026

@kfaseela Thank you for the contribution.

From the CLI perspective, this looks good.

As for the container deployments concerns, this is also good as is from my perspective. We're not introducing any new concept here, deny list file was there even before, this just adds a new format. Any enhancements for container/operator use cases would be a follow-up.

+1 also for requiring the users to explicitly set the .bloom file in the Realm config.

@kfaseela
Copy link
Copy Markdown
Contributor Author

kfaseela commented May 7, 2026

@kfaseela Thank you for the contribution.

From the CLI perspective, this looks good.

As for the container deployments concerns, this is also good as is from my perspective. We're not introducing any new concept here, deny list file was there even before, this just adds a new format. Any enhancements for container/operator use cases would be a follow-up.

+1 also for requiring the users to explicitly set the .bloom file in the Realm config.

Thanks @vmuzikar ! Good to know that you are in agreement with the current implementation. Would wait for @shawkins and @ahus1 review now!

Copy link
Copy Markdown
Contributor

@shawkins shawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @kfaseela

@ahus1 ahus1 merged commit 26c2a9e into keycloak:main May 7, 2026
88 of 89 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhancement: Improve the performance of loading large password blacklists

5 participants