pre-compute password denylist Bloom filter to speed up server startup#48515
Conversation
cf96091 to
58a49dc
Compare
04a5e1c to
057864a
Compare
|
@vmuzikar - This change adds a new "tools" command. I consider the naming well-picked, and I'm overall happy with the change. If you agree with the naming of the command, I'll review the rest of the pull request will then finally approve it. Thanks! |
b59ccc6 to
5841b23
Compare
Does the server need to have some understanding of what denylist state corresponds to a given .bloom state? That is should it continue to use the .bloom file even if the .text file has changed? To prevent this ambiguity, should the changes here require the user to instead configure the provider to use the .bloom file, rather than the .text file? Then simply fail if the .bloom file does not exist?
It's probably out-of-scope for this pr / issue, but I'm wondering If not present, and if we have a good way to ensure that the .bloom file is up-to-date, should the server be creating a .bloom file in our temp directory for reuse?
This is straight-forward with a single node, but for use in a cluster are we going to show or recommend using a shared volume / job to do this? |
Thanks for the review @shawkins! Let me know if the below points for your three questions above makes sense.
|
|
|
Would it be simpler if we just let the user configure any filename? If the file ends with It starts to feel to me that creating the association around basename ( |
@shawkins : This idea looks more straightforward and simpler to me.. WDYT? |
It's good with me, but it seemed like a requirement here was not to require realm level changes. |
@shawkins the explicit file type approach matches my requirement perfectly, I was looking for flexibility to opt in, and that's exactly what this gives. Admins who need the performance benefit simply configure denylist.bloom, others continue with denylist.txt as before. If you are okay with the approach, I can update the PR accordingly. It feels like being explicit about what is needed, avoids a lot of confusions. People who continue to use the current approach anyways do not need any realm level changes. But yes, others need to specify the filename, but isn't that better than the ambiguity? |
No objection from me, I'd prefer to be explicit. |
Signed-off-by: Faseela K <faseela.k@est.tech>
Signed-off-by: Faseela K <faseela.k@est.tech>
Signed-off-by: Faseela K <faseela.k@est.tech>
Signed-off-by: Faseela K <faseela.k@est.tech>
5841b23 to
bf80737
Compare
@shawkins @tsaarni : I have made the changes as we have discussed so far, let me know if this looks okay now! -- dropped the basename association so .bloom and plaintext are fully independent files (server checks extension only, no cross-file watching), and added -o/--output to build-password-denylist CLI for a custom output path and bloom file name. |
|
@kfaseela Thank you for the contribution. From the CLI perspective, this looks good. As for the container deployments concerns, this is also good as is from my perspective. We're not introducing any new concept here, deny list file was there even before, this just adds a new format. Any enhancements for container/operator use cases would be a follow-up. +1 also for requiring the users to explicitly set the .bloom file in the Realm config. |
Thanks @vmuzikar ! Good to know that you are in agreement with the current implementation. Would wait for @shawkins and @ahus1 review now! |
Fixes #47356
Loading large password denylist files from plaintext on every server startup or reload rebuilds the Bloom filter from scratch, which can take several seconds for multi-million-line lists.
This PR introduces a pre-computed binary (.bloom) file that the server loads
instead of rebuilding from plaintext, significantly reducing startup and reload time.
Performance (manually verified): loading a 14M-entry denylist (~133 MB plaintext) dropped from ~2-3 seconds to ~2 ms after pre-computing the .bloom file. Given this improvement, background/async loading does not appear necessary at this time.
Changes
Usage
# Pre-compute the Bloom filter kc.sh tools build-password-denylist /path/to/denylistThis generates denylist.bloom in the same directory (Optionally the CLI can accept the custom .bloom file name as argument as well, if needed). Place both files in the password-blacklists folder and start the server as usual:
kc.sh start --spi-password-policy-password-blacklist-blacklists-path=/path/to/password-denylistsThen in the Admin UI set the password blacklist policy value to /path/to/denylist/denylist.bloom
Admins with small denylists can continue using .txt files as before — no changes needed. The .bloom format is an opt-in performance optimization for large denylists.
Naming note
New code intentionally uses
denylistinstead ofblacklist. Existing identifiers (class names, SPI config keys, folder names) are left unchanged to avoid a breaking change - a separate issue can be raised to track that cleanup.