Skip to content

feat(inkless): POD-2395 Prune consolidated diskless offsets#587

Draft
viktorsomogyi wants to merge 10 commits intomainfrom
svv/ts-unification-delete-tiered-diskless
Draft

feat(inkless): POD-2395 Prune consolidated diskless offsets#587
viktorsomogyi wants to merge 10 commits intomainfrom
svv/ts-unification-delete-tiered-diskless

Conversation

@viktorsomogyi
Copy link
Copy Markdown
Contributor

Diskless logs which have been already consolidated to the remote
tier should be removed from the coordinator and the WAL. This commit
adds the functionality to do that:

  • control plane (in-memory and postgres) implementation
  • postgres routine
  • wiring into ReplicaManager

The pruning will be invoked as a periodic task in ReplicaManager
with a configurable cleanup period.

Diskless logs which have been already consolidated to the remote
tier should be removed from the coordinator and the WAL. This commit
adds the functionality to do that:
  - control plane (in-memory and postgres) implementation
  - postgres routine
  - wiring into ReplicaManager

The pruning will be invoked as a periodic task in ReplicaManager
with a configurable cleanup period.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for pruning diskless WAL batch metadata once it has been consolidated to remote storage, wiring the pruning into the broker so diskless logs can advance their effective start offset and drop already-tiered batch metadata from the control plane.

Changes:

  • Introduces a Postgres routine + control-plane job/API for pruning batches below the highest tiered offset and updating logs.diskless_start_offset.
  • Adds a periodic broker-side pruner (ConsolidatedDisklessLogPruner) scheduled from ReplicaManager to invoke the control-plane pruning and update in-memory partition state.
  • Updates fetch-path log start offset handling and expands test coverage for pruning + fetch overlay behavior.

Reviewed changes

Copilot reviewed 19 out of 136 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
storage/src/main/java/org/apache/kafka/storage/internals/log/UnifiedLog.java Exposes highestOffsetInRemoteStorage() publicly for cross-module access.
storage/inkless/src/test/java/io/aiven/inkless/control_plane/postgres/PruneBatchesBelowHighestTieredOffsetV1Test.java New PG integration tests for pruning routine semantics.
storage/inkless/src/main/resources/db/migration/V12__Prune_diskless_batches.sql Adds V12 types + pruning function in Postgres.
storage/inkless/src/main/jooq/org/jooq/generated/UDTs.java jOOQ generated updates: adds prune UDT references and schema v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/ReleaseFileMergeWorkItemResponseV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/ReleaseFileMergeWorkItemResponseV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/ListOffsetsResponseV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/ListOffsetsRequestV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/InitDisklessLogResponseV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/InitDisklessLogRequestV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/InitDisklessLogProducerStateV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/FindBatchesResponseV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/FindBatchesRequestV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/FileMergeWorkItemResponseV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/FileMergeWorkItemResponseFileV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/FileMergeWorkItemResponseBatchV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/EnforceRetentionResponseV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/EnforceRetentionRequestV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/DeleteRecordsResponseV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/DeleteRecordsRequestV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/CommitFileMergeWorkItemResponseV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/CommitFileMergeWorkItemBatchV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/CommitBatchResponseV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/CommitBatchRequestV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/BatchMetadataV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/records/BatchInfoV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/ReleaseFileMergeWorkItemResponseV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/ListOffsetsResponseV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/ListOffsetsRequestV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/InitDisklessLogResponseV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/InitDisklessLogRequestV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/InitDisklessLogProducerStateV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/FindBatchesResponseV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/FindBatchesRequestV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/FileMergeWorkItemResponseV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/FileMergeWorkItemResponseFileV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/FileMergeWorkItemResponseBatchV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/EnforceRetentionResponseV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/EnforceRetentionRequestV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/DeleteRecordsResponseV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/DeleteRecordsRequestV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/CommitFileMergeWorkItemResponseV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/CommitFileMergeWorkItemBatchV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/CommitBatchResponseV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/CommitBatchRequestV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/BatchMetadataV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/paths/BatchInfoV1Path.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/ListOffsetsResponseV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/ListOffsetsRequestV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/InitDisklessLogResponseV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/InitDisklessLogRequestV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/InitDisklessLogProducerStateV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/FindBatchesResponseV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/FindBatchesRequestV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/FileMergeWorkItemResponseV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/FileMergeWorkItemResponseFileV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/FileMergeWorkItemResponseBatchV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/EnforceRetentionResponseV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/EnforceRetentionRequestV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/DeleteRecordsResponseV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/DeleteRecordsRequestV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/CommitFileMergeWorkItemResponseV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/CommitFileMergeWorkItemBatchV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/CommitBatchResponseV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/CommitBatchRequestV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/BatchMetadataV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/udt/BatchInfoV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/records/ProducerStateRecord.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/records/LogsRecord.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/records/ListOffsetsV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/records/InitDisklessLogV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/records/GetFileMergeWorkItemV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/records/FindBatchesV2Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/records/FindBatchesV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/records/FilesRecord.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/records/FileMergeWorkItemsRecord.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/records/FileMergeWorkItemFilesRecord.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/records/EnforceRetentionV2Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/records/EnforceRetentionV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/records/DeleteRecordsV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/records/CommitFileV1Record.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/records/BatchesRecord.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/ProducerState.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/Logs.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/ListOffsetsV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/InitDisklessLogV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/GetFileMergeWorkItemV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/FindBatchesV2.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/FindBatchesV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/Files.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/FileMergeWorkItems.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/FileMergeWorkItemFiles.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/EnforceRetentionV2.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/EnforceRetentionV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/DeleteRecordsV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/CommitFileV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/tables/Batches.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/Tables.java jOOQ generated updates: adds prune table-function wiring and schema v12.
storage/inkless/src/main/jooq/org/jooq/generated/routines/ReleaseFileMergeWorkItemV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/routines/MarkFileToDeleteV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/routines/DeleteTopicV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/routines/DeleteFilesV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/routines/DeleteBatchV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/routines/CommitFileMergeWorkItemV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/routines/BatchTimestamp.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/Routines.java jOOQ generated updates: adds prune routine wiring and schema v12.
storage/inkless/src/main/jooq/org/jooq/generated/Keys.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/Indexes.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/enums/ReleaseFileMergeWorkItemErrorV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/enums/ListOffsetsResponseErrorV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/enums/InitDisklessLogErrorV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/enums/FindBatchesResponseErrorV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/enums/FileStateT.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/enums/FileReasonT.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/enums/EnforceRetentionResponseErrorV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/enums/DeleteRecordsResponseErrorV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/enums/CommitFileMergeWorkItemErrorV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/enums/CommitBatchResponseErrorV1.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/Domains.java jOOQ generated schema version bump to v12.
storage/inkless/src/main/jooq/org/jooq/generated/DefaultSchema.java jOOQ generated updates: adds prune objects into schema model and schema v12.
storage/inkless/src/main/java/io/aiven/inkless/delete/PruneDisklessLogsResponse.java New control-plane response record for pruning results.
storage/inkless/src/main/java/io/aiven/inkless/control_plane/PruneDisklessLogsRequest.java New control-plane request record for pruning inputs.
storage/inkless/src/main/java/io/aiven/inkless/control_plane/postgres/PruneDisklessLogsJob.java New PG job calling prune routine and mapping results.
storage/inkless/src/main/java/io/aiven/inkless/control_plane/postgres/PostgresControlPlaneMetrics.java Adds metrics tracking for prune job latency.
storage/inkless/src/main/java/io/aiven/inkless/control_plane/postgres/PostgresControlPlane.java Exposes pruneDisklessLogs via Postgres control plane.
storage/inkless/src/main/java/io/aiven/inkless/control_plane/MetadataView.java Adds topic-id->name lookup and consolidating partition enumeration.
storage/inkless/src/main/java/io/aiven/inkless/control_plane/InMemoryControlPlane.java Adds new API method (currently unimplemented).
storage/inkless/src/main/java/io/aiven/inkless/control_plane/ControlPlane.java Adds new pruneDisklessLogs control-plane API.
core/src/test/scala/io/aiven/inkless/consolidation/DisklessLeaderEndPointTest.scala Adjusts/extends fetch overlay tests for logStartOffset + error behavior.
core/src/test/scala/io/aiven/inkless/consolidation/ConsolidatedDisklessLogPrunerTest.scala New unit tests for pruner request building and update behavior.
core/src/test/java/kafka/server/InklessConsolidatedDisklessTopicsTest.java Adds end-to-end assertions that control-plane WAL metadata is pruned post-tiering.
core/src/main/scala/kafka/server/ReplicaManager.scala Schedules periodic consolidated-diskless pruning task.
core/src/main/scala/kafka/server/metadata/InklessMetadataView.scala Implements new MetadataView APIs for topic name and consolidating partitions.
core/src/main/scala/kafka/cluster/Partition.scala Adds volatile diskless start offset state + setters/getters.
core/src/main/scala/io/aiven/inkless/consolidation/DisklessLeaderEndPoint.scala Overlays local partition logStartOffset into fetch response when appropriate.
core/src/main/scala/io/aiven/inkless/consolidation/ConsolidatedDisklessLogPruner.scala New broker-side pruner invoking control plane and updating partitions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +28 to +40
IF arg_requests IS NOT NULL AND cardinality(arg_requests) > 0 THEN
PERFORM 1
FROM logs l
WHERE EXISTS (
SELECT 1
FROM unnest(arg_requests) AS r
WHERE r.topic_id = l.topic_id AND r.partition = l.partition
)
ORDER BY l.topic_id, l.partition -- ordering is important to prevent deadlocks
FOR UPDATE;
END IF;
FOREACH l_request IN ARRAY arg_requests LOOP
FOR l_file_id IN
Comment on lines +41 to +54
controlPlane.pruneDisklessLogs(requests).asScala.foreach { pruneDisklessLogsResponse =>
inklessMetadataView.getTopicName(pruneDisklessLogsResponse.topicIdPartition.topicId).toScala match {
case Some(topicName) =>
val responseTopicIdPartition = new TopicIdPartition(pruneDisklessLogsResponse.topicIdPartition.topicId,
pruneDisklessLogsResponse.topicIdPartition.partition, topicName)
replicaManager.getPartitionOrError(responseTopicIdPartition.topicPartition) match {
case Right(partition) => partition.setDisklessStartOffset(pruneDisklessLogsResponse.disklessStartOffset)
case Left(error) => logger.warn("Couldn't update diskless start offset for {} due to: {}",
responseTopicIdPartition.topicPartition,
error.message
)
}
case None =>
logger.warn("Couldn't update diskless start offset of topic with ID {} due to missing name", pruneDisklessLogsResponse.topicIdPartition.topicId)
Comment on lines +737 to +740
@Override
public List<PruneDisklessLogsResponse> pruneDisklessLogs(List<PruneDisklessLogsRequest> pruneDisklessLogsRequests) {
throw new UnsupportedOperationException();
}
Comment on lines +9 to +20
import org.jooq.DSLContext;
import org.jooq.generated.udt.PruneBatchesBelowHighestTieredOffsetResponseV1;
import org.jooq.generated.udt.records.PruneBatchesBelowHighestTieredOffsetRequestV1Record;
import org.jooq.generated.udt.records.PruneBatchesBelowHighestTieredOffsetResponseV1Record;

import java.time.Instant;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.function.Consumer;

import static org.jooq.generated.Tables.PRUNE_BATCHES_BELOW_HIGHEST_TIERED_OFFSET_V1;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants