Skip to content

fix(inkless:metrics): Fix double-counting of diskless topic and partition metrics#552

Merged
giuseppelillo merged 1 commit intomainfrom
jeqo/fix-diskless-controller-metrics
Mar 27, 2026
Merged

fix(inkless:metrics): Fix double-counting of diskless topic and partition metrics#552
giuseppelillo merged 1 commit intomainfrom
jeqo/fix-diskless-controller-metrics

Conversation

@jeqo
Copy link
Copy Markdown
Contributor

@jeqo jeqo commented Mar 26, 2026

When a diskless topic was created, both the TopicDelta and ConfigDelta fired in the same metadata batch, causing disklessTopicCount and disklessPartitionCount to be incremented twice. Over time this caused DisklessTopicCount to drift above GlobalTopicCount.

The fix uses prevImage.configs() instead of newImage.configs() when processing TopicDeltas, so that all diskless config transitions are handled exclusively by the config change loop.

@jeqo jeqo marked this pull request as ready for review March 26, 2026 14:52
@jeqo jeqo requested a review from Copilot March 26, 2026 14:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes controller metrics drift where diskless topic/partition metrics were double-counted when topic creation and diskless config updates occurred in the same metadata batch, ensuring diskless counts remain consistent with global counts over time.

Changes:

  • Use prevImage.configs() (instead of newImage.configs()) when classifying diskless status during TopicDelta processing to avoid double-counting.
  • Add regression tests covering diskless topic creation, deletion, and combined partition+config changes in a single delta batch.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
metadata/src/main/java/org/apache/kafka/controller/metrics/ControllerMetadataMetricsPublisher.java Adjusts diskless classification during topic delta handling to prevent double increments when config deltas are also present.
metadata/src/test/java/org/apache/kafka/controller/metrics/ControllerMetadataMetricsPublisherTest.java Adds targeted regression tests to validate diskless metrics correctness across delta scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…tion metrics

When a diskless topic was created, both the TopicDelta and ConfigDelta
fired in the same metadata batch, causing disklessTopicCount and
disklessPartitionCount to be incremented twice. Over time this caused
DisklessTopicCount to drift above GlobalTopicCount.

The fix uses prevImage.configs() instead of newImage.configs() when
processing TopicDeltas, so that all diskless config transitions are
handled exclusively by the config change loop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jeqo jeqo force-pushed the jeqo/fix-diskless-controller-metrics branch from cd0b9ab to 80f29c4 Compare March 26, 2026 15:35
@giuseppelillo giuseppelillo merged commit 87767a0 into main Mar 27, 2026
4 checks passed
@giuseppelillo giuseppelillo deleted the jeqo/fix-diskless-controller-metrics branch March 27, 2026 09:28
jeqo added a commit that referenced this pull request Mar 30, 2026
…tion metrics (#552)

When a diskless topic was created, both the TopicDelta and ConfigDelta
fired in the same metadata batch, causing disklessTopicCount and
disklessPartitionCount to be incremented twice. Over time this caused
DisklessTopicCount to drift above GlobalTopicCount.

The fix uses prevImage.configs() instead of newImage.configs() when
processing TopicDeltas, so that all diskless config transitions are
handled exclusively by the config change loop.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
jeqo added a commit that referenced this pull request Mar 30, 2026
…tion metrics (#552)

When a diskless topic was created, both the TopicDelta and ConfigDelta
fired in the same metadata batch, causing disklessTopicCount and
disklessPartitionCount to be incremented twice. Over time this caused
DisklessTopicCount to drift above GlobalTopicCount.

The fix uses prevImage.configs() instead of newImage.configs() when
processing TopicDeltas, so that all diskless config transitions are
handled exclusively by the config change loop.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants