Benjamin Wagner

Benjamin Wagner

San Francisco Bay Area
3K followers 500+ connections

Activity

Experience

  • Firebolt Graphic

    Firebolt

    San Francisco Bay Area

  • -

    San Francisco Bay Area

  • -

    Munich, Bavaria, Germany

  • -

    Munich, Bavaria, Germany

  • -

    Munich, Bavaria, Germany

  • -

    Munich, Bavaria, Germany

  • -

    Berlin und Umgebung, Deutschland

Education

  • Technical University of Munich Graphic

    Technical University Munich

    -

    Master’s Thesis: “Incremental Fusion: Unifying Compiled and Vectorized Query Execution”

  • -

  • -

Publications

  • Assembling a Query Engine From Spare Parts

    CDMS @ VLDB '22, September 9, 2022, Sydney, Australia

    Building a new cloud data warehouse is a daunting challenge, requiring massive investments into both the query engine and surrounding cloud infrastructure. Given the mature space, it might seem like a Herculean task to enter the market as a small startup.
    At Firebolt we assembled a working, high-performance cloud data warehouse in less than 18 months. We achieved this by building our query engine on top of existing projects and then investing heavily into differentiating features. This paper…

    Building a new cloud data warehouse is a daunting challenge, requiring massive investments into both the query engine and surrounding cloud infrastructure. Given the mature space, it might seem like a Herculean task to enter the market as a small startup.
    At Firebolt we assembled a working, high-performance cloud data warehouse in less than 18 months. We achieved this by building our query engine on top of existing projects and then investing heavily into differentiating features. This paper presents our decision-making and learned lessons along the way.

    Other authors
    See publication
  • Incremental Fusion: Unifying Compiled and Vectorized Query Execution

    ICDE'24, May 13-17, 2024, Utrecht, Netherlands

    Modern high-performance analytical query engines follow one of two execution paradigms. Vectorized engines implement an interpreter for relational algebra operators that operates on batches of tuples to maximize performance. Compiling engines, on the other hand, generate optimized and specialized
    code for every query. This paper unifies these two approaches. We present Incremental Fusion, a novel execution paradigm for modern, high-performance query engines. An Incremental Fusion engine…

    Modern high-performance analytical query engines follow one of two execution paradigms. Vectorized engines implement an interpreter for relational algebra operators that operates on batches of tuples to maximize performance. Compiling engines, on the other hand, generate optimized and specialized
    code for every query. This paper unifies these two approaches. We present Incremental Fusion, a novel execution paradigm for modern, high-performance query engines. An Incremental Fusion engine performs operator-fusing code generation – with a twist: The compiling engine generates its own vectorized interpreter. The engine uses a finite set of building blocks below relational algebra for code generation. It can enumerate each building block and generate a vectorized primitive for it. The vectorized interpreter becomes a free byproduct of carefully choosing the right abstraction for code generation. This allows an Incremental Fusion engine to dynamically switch between vectorized interpretation and operator-fusing code generation. We demonstrate Incremental Fusion in our open-source prototype engine InkFuse.
    We measure InkFuse against the state-of-the-art vectorized and compiling engines DuckDB and Umbra. InkFuse is able to achieve competitive performance both for low-latency processing, and compute-intensive long-running queries.

    Other authors
    See publication
  • Self-Tuning Query Scheduling for Analytical Workloads

    SIGMOD ’21, June 20–25, 2021, Virtual Event, China

    Most database systems delegate scheduling decisions to the operating system. While such an approach simplifies the overall database design, it also entails problems. Adaptive resource allocation becomes hard in the face of concurrent queries. Furthermore, incorporating domain knowledge to improve query scheduling is difficult. To mitigate these problems, many modern systems employ forms of task-based parallelism. The execution of a single query is broken up into small, independent chunks of…

    Most database systems delegate scheduling decisions to the operating system. While such an approach simplifies the overall database design, it also entails problems. Adaptive resource allocation becomes hard in the face of concurrent queries. Furthermore, incorporating domain knowledge to improve query scheduling is difficult. To mitigate these problems, many modern systems employ forms of task-based parallelism. The execution of a single query is broken up into small, independent chunks of work (tasks). Now, fine-grained scheduling decisions based on these tasks are the responsibility of the database system. Despite being commonplace, little work has focused on the opportunities arising from this execution model.

    In this paper, we show how task-based scheduling in database systems opens up new areas for optimization. We present a novel lock-free, self-tuning stride scheduler that optimizes query latencies for analytical workloads. By adaptively managing query priorities and task granularity, we provide high scheduling elasticity. By incorporating domain knowledge into the scheduling decisions, our system is able to cope with workloads that other systems struggle with. Even at high load, we retain near optimal latencies for short running queries. Compared to traditional database systems, our design often improves tail latencies by more than 10x

    Other authors
    See publication

Languages

  • German

    Native or bilingual proficiency

  • English

    Professional working proficiency

  • French

    Elementary proficiency

View Benjamin’s full profile

  • See who you know in common
  • Get introduced
  • Contact Benjamin directly
Join to view full profile

Other similar profiles

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content