Cristian Leo

Cristian Leo

New York, New York, United States
5K followers 500+ connections

About

I work on AI systems for cybersecurity, with a focus on LLM agents, evaluation…

Articles by Cristian

  • OpenAI o1 Is Thinking

    How Reinforcement Learning helps with complex reasoning by producing a long internal chain of thought OpenAI is back…

Activity

5K followers

See all activities

Experience

  • Amazon Web Services (AWS) Graphic

    Amazon Web Services (AWS)

    New York, New York, United States

  • -

    New York, New York, United States

  • -

    New York City Metropolitan Area

  • -

    New York City Metropolitan Area

Education

  • Columbia University Graphic

    Columbia University in the City of New York

    4.1/4.0

    -

    Activities and Societies: Executive Member of Columbia Data Science Society Associate of Applied Analytics Club Associate of Business Management Club

    Relevant Coursework: Machine Learning for Finance, Cloud Computing (AWS), Managing Data, Storytelling with Data, Applied Analytics Frameworks and Methods, Research Design, Analytics and Leading change, Applied Analytics in Organizational Context, Strategy & Analytics

  • -

  • -

    -

Licenses & Certifications

Join now to see all certifications

Volunteer Experience

Publications

  • SIR-Bench: Evaluating Investigation Depth in Security Incident Response Agents

    We present SIR-Bench, a benchmark of 794 test cases for evaluating autonomous security incident response agents that distinguishes genuine forensic investigation from alert parroting. Derived from 129 anonymized incident patterns with expert-validated ground truth, SIR-Bench measures not only whether agents reach correct triage decisions, but whether they discover novel evidence through active investigation. To construct SIR-Bench, we develop Once Upon A Threat (OUAT), a framework that replays…

    We present SIR-Bench, a benchmark of 794 test cases for evaluating autonomous security incident response agents that distinguishes genuine forensic investigation from alert parroting. Derived from 129 anonymized incident patterns with expert-validated ground truth, SIR-Bench measures not only whether agents reach correct triage decisions, but whether they discover novel evidence through active investigation. To construct SIR-Bench, we develop Once Upon A Threat (OUAT), a framework that replays real incident patterns in controlled cloud environments, producing authentic telemetry with measurable investigation outcomes. Our evaluation methodology introduces three complementary metrics: triage accuracy (M1), novel finding discovery (M2), and tool usage appropriateness (M3), assessed through an adversarial LLM-as-Judge that inverts the burden of proof -- requiring concrete forensic evidence to credit investigations. Evaluating our SIR agent on the benchmark demonstrates 97.1% true positive (TP) detection, 73.4% false positive (FP) rejection, and 5.67 novel key findings per case, establishing a baseline against which future investigation agents can be measured.

    Other authors
    See publication
  • Survey of Attention Mechanisms in Encoder-Only Language Models

    The introduction of the Transformer architecture in 2017 catalyzed a fundamental paradigm shift in natural language processing Vaswani et al. 2017. While decoder-only autoregressive models have come to dominate generative AI, encoder-only bidirectional models historically establish and maintain state-of-the-art results in natural language understanding, dense information retrieval, and sequence classification. This paper presents an exhaustive survey of the self-attention mechanism within…

    The introduction of the Transformer architecture in 2017 catalyzed a fundamental paradigm shift in natural language processing Vaswani et al. 2017. While decoder-only autoregressive models have come to dominate generative AI, encoder-only bidirectional models historically establish and maintain state-of-the-art results in natural language understanding, dense information retrieval, and sequence classification. This paper presents an exhaustive survey of the self-attention mechanism within encoder-only models. We analyze its mathematical foundations, trace its architectural evolution from the original BERT Devlin et al. 2018 through RoBERTa Liu et al. 2019, DeBERTa He et al. 2021, and ModernBERT Warner et al. 2024, evaluate efficiency enhancements including sparse attention Zaheer et al. 2020, linear kernelized attention, and hardware-aware FlashAttention Dao et al. 2022, and dissect the ongoing theoretical debates surrounding attention-based interpretability. We further survey hybrid architectures that interleave state-space models with self-attention, and discuss the practical limits and deployment considerations of encoder attention in long-context regimes. To complement this survey, we open-source an interactive visualization tool for exploring these architectures at https://github. com/cristianleoo/attention-in-encoders.

    Other authors
    See publication
  • Geometric Concept Spaces in Small Encoders: A Comparative Mechanistic Probing of ModernBERT and DeBERTa-v3

    Bidirectional transformer encoders have bifurcated into two optimization paradigms: topological precision via disentangled attention (DeBERTa-v3) and hardware-aware scaling via rotary positional embeddings (ModernBERT). This study presents an exhaustive geometric and mechanistic investigation of these architectures using 100,000 activation samples. Through linear probing, Centered Kernel Alignment (CKA), and intrinsic dimensionality estimation, we reveal a 16.5% performance gap in linear…

    Bidirectional transformer encoders have bifurcated into two optimization paradigms: topological precision via disentangled attention (DeBERTa-v3) and hardware-aware scaling via rotary positional embeddings (ModernBERT). This study presents an exhaustive geometric and mechanistic investigation of these architectures using 100,000 activation samples. Through linear probing, Centered Kernel Alignment (CKA), and intrinsic dimensionality estimation, we reveal a 16.5% performance gap in linear concept separability favoring DeBERTa-v3 (p< 0.001). We identify an extreme" Topological Collapse" in ModernBERT's final layers, where concept manifolds condense from 30 dimensions to 2. We quantify a fundamental stability-precision trade-off: ModernBERT's RoPE provides 4.3 x higher local positional stability but induces severe semantic entanglement, while DeBERTa-v3 utilizes sparse, specialized sub-circuits to maintain precise orthogonal boundaries. Our findings provide a rigorous geometric explanation for the" token classification anomaly" in modern encoders.

    See publication

Courses

  • Applied Analytics in Organizational Context

    APANPS 5100

  • Business Process Modeling

    -

  • Cloud Computing

    APANPS 5450

  • Financial Analysis

    -

  • Frameworks & Methods

    APANPS 5205

  • Managing Data

    APANPS 5400

  • Negotiating in English

    -

  • Persuading in English

    -

  • Research Design

    APANPS 5300

  • Social Media Management

    -

  • Storytelling with Data

    APANPS 5800

  • Strategy & Analytics

    APANPS 5600

Projects

  • Scrap Metal Directional Price Prediction

    -

    This project focuses on analyzing financial news sentiment and utilizing machine learning models to predict stock prices based on the sentiment analysis. It integrates data from various sources, including financial news articles, stock prices, economic indicators, and weather data. The sentiment analysis is performed using two models: FinBert and GPT (Generative Pre-trained Transformer). The machine learning model for price prediction employs the CatBoost algorithm.

  • Quant AI

    -

    As you're undoubtedly aware, the vast volume of news and rumors that emerge daily can be overwhelming, rendering it practically impossible for an individual to thoroughly process each piece of information.
    To counter this challenge, we have integrated cutting-edge LLMs to develop an innovative application designed to assist users in comprehending market sentiment.
    Our application harnesses the power of user-specified sources, processing and analyzing vast amounts of data with exceptional…

    As you're undoubtedly aware, the vast volume of news and rumors that emerge daily can be overwhelming, rendering it practically impossible for an individual to thoroughly process each piece of information.
    To counter this challenge, we have integrated cutting-edge LLMs to develop an innovative application designed to assist users in comprehending market sentiment.
    Our application harnesses the power of user-specified sources, processing and analyzing vast amounts of data with exceptional accuracy.
    It leverages the capabilities of LSTM models, a type of recurrent neural network well-suited for sequence prediction problems, to predict market trends.
    This integration of LLMs and LSTM models provides a robust and comprehensive solution to keep up with the pace of real-time information flow, resulting in a powerful tool for understanding and predicting market sentiment.
    The ultimate goal is to empower our users to make informed decisions based on accurate, up-to-date, and predictive insights. (Initially built for Tribe AI Hackathon).

  • Prediction of Wild Blueberry Yield | Kaggle Competition | Top 1.5% Leaderboard

    -

    This data science project involves regression analysis using two models: LADRegression and LightGBM.
    The project includes data preprocessing, feature engineering using Principal Component Analysis (PCA) and Partial Least Squares (PLS) regression, hyperparameter tuning using grid search, and model evaluation.
    In particular, the project performs several predictions using different models, and then stacks them using Least Additive Regression with positive parameters.
    The goal is to…

    This data science project involves regression analysis using two models: LADRegression and LightGBM.
    The project includes data preprocessing, feature engineering using Principal Component Analysis (PCA) and Partial Least Squares (PLS) regression, hyperparameter tuning using grid search, and model evaluation.
    In particular, the project performs several predictions using different models, and then stacks them using Least Additive Regression with positive parameters.
    The goal is to develop accurate regression models and optimize their performance on the given dataset.
    I created this notebook for a Kaggle Competition, where I resulted in the top 1.5% of the leaderboard.

  • The Tinder Of Food

    -

    This project is a Flask interactive web application that displays a map of New York City and allows users to query it, along with a recommendation algorithm that matches suppliers to restuarants.
    The application uses a combination of Python, html, css, and Javascript. The data is stored using Apache Spark and MongoDB.

    See project
  • LSTM to predict the number of weekly appointments for Columbia University

    -

    In this data science project I predicted the number of weekly appointments for Columbia University. The goal of this project was to determine if there was a need to hire temporary staff based on the predicted appointment volume. To achieve this, I used first feature engineering, which involved analyzing changes in the characteristics of students over time. This helped me to identify patterns and trends that could affect the number of appointments. After performing the feature engineering, I…

    In this data science project I predicted the number of weekly appointments for Columbia University. The goal of this project was to determine if there was a need to hire temporary staff based on the predicted appointment volume. To achieve this, I used first feature engineering, which involved analyzing changes in the characteristics of students over time. This helped me to identify patterns and trends that could affect the number of appointments. After performing the feature engineering, I used a deep learning model called a Long Short-Term Memory (LSTM) model to make predictions. The LSTM model was trained on the historical appointment data to learn from the patterns and trends in the data. Using the LSTM model, I was able to predict the weekly appointment volume for the next two months. This information can be used by Columbia University to make informed decisions about hiring temporary staff and managing resources efficiently.

    See project
  • Predicting Tesla Stock from Elon Musk's tweets

    -

    In this data science project I predicted the the fluctuation of Tesla Stock from Elon Musk Tweets.
    The goal of this project was to predict if the stock would decrease or increase based on the previous day's Elon Musk tweets.
    To achieve this, I used first data preprocessing on the tweets column, which involved removing URLs, stripping whitespaces, removing non alpha characters, stemming words and creating a TF-IDF matrix.
    Secondly, I performed exploratory analysis to visualize…

    In this data science project I predicted the the fluctuation of Tesla Stock from Elon Musk Tweets.
    The goal of this project was to predict if the stock would decrease or increase based on the previous day's Elon Musk tweets.
    To achieve this, I used first data preprocessing on the tweets column, which involved removing URLs, stripping whitespaces, removing non alpha characters, stemming words and creating a TF-IDF matrix.
    Secondly, I performed exploratory analysis to visualize correlation between words and stock fluctuation, the top words used in the tweets, etc.
    Then, I used feature engineering. Firstly I right merged the tesla stock dataset imported using Yahoo Finance API, then I grouped by day concatenating all the tweets that happened in the same day. After that, I was able to extract some other useful information such as the number of tweets per day, the average length of the tweets, number of emoji used and so on.
    After that, I performed sentiment analysis using vader, which provided useful sentiment scores.
    Finally, I performed data modeling. For this step I used both a LSTM model on the stock data, and a neural network on the NLP data. Then, I combined the outputs with a second neural network to have one final layer with one output.

    See project
  • Using Deep Learning to predict disasters from Twitter

    -

    In this project, I am using the pre-trained BERT (Bidirectional Encoder Representations from Transformers) model to classify tweets as either disaster-related or not. I train the model using a combination of cross-entropy loss and mixup regularization, and use early stopping to prevent overfitting. Overall, this project demonstrates the use of transfer learning and fine-tuning with BERT for natural language processing tasks.

    See project
  • Building a Recommendation System with Machine Learning

    -

    The project predicts the popularity of recipes, recommending the system to show or less a certain recipe. This project involves data Validation, data cleaning, exploratory analysis, feature engineering, data modeling with Genetic Tuning algorithms, SGD Classification, XGBoost Classification, Random Forest Classification, and model stacking.

    See project
  • API application (12-Twenty & Google Cloud) - Columbia University

    -

    Created an application to automatize the extraction of data from the University database, data cleaning, feature engineering, and posting data to Google data studio. The application uses two main API endpoints: 12-Twenty and Google Cloud.

    See project

Languages

  • English

    Full professional proficiency

  • Italian

    Native or bilingual proficiency

  • Spanish

    Full professional proficiency

  • Chinese

    Limited working proficiency

View Cristian’s full profile

  • See who you know in common
  • Get introduced
  • Contact Cristian directly
Join to view full profile

Other similar profiles

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

Add new skills with these courses