Romal Thoppilan
United States
9K followers
500+ connections
View mutual connections with Romal
Romal can introduce you to 10+ people at Google DeepMind
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Romal
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
About
Building general intelligence at Google Deepmind!
Previously,
Founding…
Activity
9K followers
-
Romal Thoppilan reposted thisRomal Thoppilan reposted thisAn Anthropic researcher was eating a sandwich in a park when Claude Mythos Preview emailed him after escaping its test sandbox. The model developed what Anthropic calls a "moderately sophisticated" multi-step exploit to gain broad internet access from a sandboxed system. Then, unasked, posted details about the exploit on hard-to-find but public-facing websites. Anthropic will not release it. Instead they launched Project Glasswing, committing up to $100M to deploy Mythos for defense with Apple, Google, Microsoft, Amazon Web Services (AWS), and 8 other partners. What Mythos found when pointed at real code: • Thousands of high- and critical-severity vulnerabilities • A 27-year-old bug in OpenBSD, an OS known for its security • A 17-year-old remote code execution in FreeBSD, found and exploited autonomously • 181 Firefox exploits where the previous best model managed 2 Links are in the comments.
-
Romal Thoppilan reposted thisRomal Thoppilan reposted thisBring any idea to life with Gemini 3: our most intelligent model, designed to help you learn, build and plan anything. We’re first releasing Gemini 3 Pro, which is rolling out globally starting today. This is how we’re pushing the frontier: 🔵 State-of-the-art reasoning: It understands prompts with incredible depth and nuance, delivering clear, direct answers without clichés or filler. As our most factual model, it’s more reliable for complex questions in science and math. 🔵 World-leading multimodal understanding: Gemini 3 seamlessly comprehends text, images, video, audio, and code. It adapts to you, responding with whatever best suits your needs. Quickly turn text lessons into visual flashcards or ask Gemini to break down concepts from a long video. 🔵 Our best model for vibe and agentic coding: You can build dynamic, beautiful apps from a single prompt. We’ve also improved agentic code performance – supporting existing tools and our new agentic development platform, Google Antigravity. We can’t wait to see what you build. Here’s how you can try it in Gemini app, Google AI Studio, AI Mode in Search, and Google Cloud’s Vertex AI for enterprises → https://goo.gle/Gemini-3
-
Romal Thoppilan reposted thisRomal Thoppilan reposted thisIntroducing Gemini 2.5 Pro, the world's most powerful model, with unified reasoning capabilities + all the things you love about Gemini (long context, tools, etc) Available as experimental and for free right now in Google AI Studio + API, with pricing coming very soon! Read more: https://lnkd.in/gsR8YBRg
-
Romal Thoppilan reposted thisRomal Thoppilan reposted thisHad an insightful discussion with Hon'ble Minister Ashwini Vaishnaw about launching a strategic initiative to bring together Indian-origin AI researchers from around the world. Indians have made significant contributions to modern AI, from the early Transformers paper, to teams behind leading models like OpenAI’s GPT, Anthropic’s Claude, Google’s Gemini, and Meta’s Llama. We're collaborating with the best minds to build foundation models that will power India’s AI future. A key challenge in building foundation models for India lies in the lack of internet-scale data, unlike the US or China, combined with the country’s immense linguistic diversity. To truly democratize these models, our approach will focus on the unique conversational style of Indians, which often involves heavy code-switching between languages and dialects. We plan to use synthetic data generation and reinforcement learning to train LLMs, and are committed to open-sourcing essential components, including frameworks and code, reinforcement learning data, and model weights for select models. We’re looking to hire exceptional AI Engineers to join us on this mission. 👨💻 FTE: ₹40L Base + ₹40L ESOPs 🧑🎓 Intern: ₹1L/month 📍 Location: Virtual Know someone who’d be a perfect fit? Tag them! If you’re interested, comment below, and we’ll get in touch.
-
Romal Thoppilan reposted thisRomal Thoppilan reposted thisLast weekend, I dived into this book on training large-scale foundation models, and was instantly impressed by its quality. Having been working on foundation model training at scale, the content of this book feels very close to heart. One key takeaway: distributed training is transitioning from an auxiliary skill into a core of machine learning engineering as enter the LLM era; and I can’t agree more. If today’s system design interviews are shaped by the internet era, it’s only a matter of time before LLM scaling becomes the new standard in AI system design. Would recommend this book as a must-read for anyone working on foundation model training!
-
Romal Thoppilan reposted thisRomal Thoppilan reposted thisOur caching story at Character.AI as we blew through two orders of magnitude to scale our overall requests per second by 100x. https://lnkd.in/g5G6xGUSCharacter.AI’s storybook ending with Memorystore for Redis Cluster | Google Cloud BlogCharacter.AI’s storybook ending with Memorystore for Redis Cluster | Google Cloud Blog
-
Romal Thoppilan reposted thisWe just published a blog summarizing some of the most important inference tricks we use at Character.AI. These tricks allow us to serve >20k qps which is like 20% of Google Search. In short: small kv cache + inter-turn cache = cheap inference! https://lnkd.in/ehMqFqcE
-
Romal Thoppilan reposted thisRomal Thoppilan reposted thisWe are honored to be recognized as Google Play’s Best AI App of 2023! 🎉 Big shoutout to Google Play, our passionate community, incredible Character creators, and the entire C.AI team for making this possible. ❤️ 🔗 https://lnkd.in/etRFJ7Kq #GooglePlayBestOf #CharacterAI #CAI
-
Romal Thoppilan reposted thisRomal Thoppilan reposted thisWe have officially launched ✨ Character.AI Group Chat ✨ With our latest feature, users can create meaningful connections, exchange ideas, and collaborate in real-time — not just with humans, but with their favorite AI Characters too. 😉 Read more here: https://lnkd.in/g_vy-H-a
-
Romal Thoppilan liked thisRomal Thoppilan liked thisHappy to announce that I’m a Certified: AI-Empowered SAFe Product Owner/Product Manager by SAFe by Scaled Agile, Inc. Scaling agility and Artificial Intelligence are now core pillars for efficient value delivery. This SAFe program focuses on the intersection of both fields to optimize product management. Key Focus Areas: 1. Leveraging AI within the PO/PM role. 2. Strategic alignment within the SAFe framework. 3. Maximizing value flow in scaled environments. I am grateful to ICW Group for continuously investing in our growth. #SAFe #AI #ProductManagement #Agile #POPM #ProductOwner #Innovation View my verified achievement from SAFe by Scaled Agile, Inc..Certified AI-Empowered SAFe® Product Owner/Product Manager was issued by SAFe by Scaled Agile, Inc. to Roshni Chavady.Certified AI-Empowered SAFe® Product Owner/Product Manager was issued by SAFe by Scaled Agile, Inc. to Roshni Chavady.
-
Romal Thoppilan liked thisRomal Thoppilan liked thisMost teams today are renting intelligence. Today, RadixArk is launching with $100M in seed funding to let enterprises and AI builders own it instead. RadixArk started with two foundational open-source products: SGLang (the open-source inference engine already serving trillions of tokens a day for Google, Microsoft, NVIDIA, AMD, xAI, and many others) and Miles (an open framework for large-scale reinforcement learning). Their plan is to build the full end-to-end infrastructure that enables every team to own, operate, and continuously improve their AI systems at scale. Frontier-grade AI infrastructure has, until now, lived inside a handful of companies. RadixArk is building the counterweight: open, high-performance, and accessible to everyone. We were early believers in Ying Sheng, Banghua Zhu and the RadixArk founding team, and it was clear from the beginning that they were building for decades. We are proud to be on that journey with them. More on why we invested below.
-
Romal Thoppilan liked thisRomal Thoppilan liked thisGDP.pdf was just accepted to the CVPR 2026 Workshop on Multimodal Reasoning. We partnered with hundreds of expert Surgers — ER physicians, construction engineers, corporate litigators — to build a benchmark that tests whether frontier models can handle the documents that run the global economy. Every frontier model scored under 15%. Paper and results below. Paper: https://lnkd.in/e7f6xQaf Dataset: https://lnkd.in/ePwvGmuR Leaderboard: https://lnkd.in/ePn7RQzh Blog: https://lnkd.in/eBHYBbhmGDP.pdf: Can $100B AI Models Master the Documents that Run the World?GDP.pdf: Can $100B AI Models Master the Documents that Run the World?
-
Romal Thoppilan liked thisRomal Thoppilan liked thisSarvam AI is India's answer to OpenAI So I wanted to know: which university is actually producing the people building it? I pulled the data on over 160 Sarvam employees via the Crustdata API BITS Pilani came out on top. By a lot. 14 employees went to BITS. The next closest is IIT Delhi with 9. IIT Madras has 7. Every other IIT - Kharagpur, Bombay, Kanpur - sits between 4 and 5. BITS is outranking every single IIT individually. That surprised me. IITs have the brand and the prestige. They dominate every "best engineering school in India" list. But when you look at who's actually in the room building foundational AI models - BITS is punching above its weight. Also worth noting: Shiv Nadar University shows up with 4 employees alongside the legacy giants. I think what this tells you is that building LLMs from scratch selects for a very specific kind of person: The ones who went deep on research, stayed curious, and kept building. BITS has always had that culture!
-
Romal Thoppilan liked thisRomal Thoppilan liked thisGPT-5.5 by OpenAI is now live in the Arena, landing across multiple leaderboards. Here’s how it ranks by modality: - Code Arena (agentic web dev): #9, a strong +50pt jump over GPT-5.4 - Document Arena (analysis & long-content reasoning): #6, on par with Sonnet 4.6 - Text Arena: #7, Math #3, Instruction Following: #8 - Expert Arena: #5 - Search Arena: #2 - Vision Arena: #5 Strong, well-rounded performance, especially in Code (+50 pts vs GPT-5.4). Dive deeper into each category leaderboard ranking at arena.ai/leaderboard Congrats to OpenAI on the release!
-
Romal Thoppilan liked thisRomal Thoppilan liked this[DeepSeek V4 Pro summary] building LLMs is increasingly looking more and more like building a car or an airplane! 😅 Here are few interesting bits that stood out to me. High level takeaways: * Coding is still significantly behind the frontier. e.g. on their internal R&D coding benchmark Pro has a pass rate 67% vs 80% for Opus 4.6 Thinking (and frontier has since moved to Mythos/Spud which are new pretrains). * Opus 4.5 better than DSV4 Pro on highly complex, multi-turn prompts on Chinese writing. * Size of the models is still an order of magnitude behind frontier (O(10T) vs O(1T)). With 1.6T params (49B activated) OSS finally closed/crossed the gap (size-wise) with GPT-4 which finished training EOY '22 (!!) * Agentic search consistently outperforms RAG - curious what this means for the future of vector db companies. codex/cc have already embraced this paradigm, and moved away from Cursor's RAG-based approach. * Ctx length extended to 1M. * Arch modifications yield 27% single-token inference FLOPs, 10% KV cache compared to DeepSeek V3.2! * Pretrain ds size = 33T tokens. Training instabilities: They had loss spikes due to outliers in MoE layers (routing mechanism in particular). 2 hacky slns: 1. SwiGLU Clamping (linear between [-10,10], gate upper bound clipped at 10) 2. Anticipatory Routing mode - when loss spike is detected they roll back and push routing indices out of sync by delta_t steps, lots of optimization went to get this working. Architecture: * First two layers are HCA, after that CSA+HCA interleaved. These make it possible to go to 1M ctxlen. * CSA (Compressed Sparse Attention): compress KV cache along seq dim (m=4 tokens replaced with 1 smaller-dim entry), followed by DSA (sparse-attn they prev. introduced). DSA does top-k (k=1024) key selection. Finally they do MQA (single key replicated accross all query heads). * HCA (Heavily Compressed Attention) - similar to CSA except that m’ (128) >> m and global attn (MQA). * mHC (manifold constrained hyper connections): more expressive residual connections. manifold == makes sure that spectral norm of mapping matrix is bounded by 1 (transformation is non-expansive). * MTP (multi-token prediction) + helps during inference with specexec. * Additional branch of sliding window attn, because query in CSA/HCA can’t attend to keys/values from the same compressed block (otherwise we’d get non-causal transformer!) and local tokens matter a lot. so this adds an additional n_win uncompressed KV entries corresponding to recent n_win tokens. * Attn Sinks (allows total attention scores not to be equal to 1, i.e. not a probability distribution anymore) * MoE: 384 routed experts (6 active) + 1 shared. Surprisingly hash routing strategy for first 3 MoE layers? I’m also surprised they've chosen such a homogeneous attn layout, and didn't allocate more CSA layers to deeper layers (which tend to have longer receptive field). Likely for efficiency reasons, or lack of time for such ablations. More in comments!
-
Romal Thoppilan liked thisRomal Thoppilan liked thisIndustrial-scale science. Coming to a lab near you this summer.
-
Romal Thoppilan liked thisA legendary company in the making. AI pioneers Andrew Dai and Yinfei Yang are focusing their talent on the hardest and largest problem to date in multimodal reasoning. Read more below.Romal Thoppilan liked thisStriker is excited to co-lead the $55m Seed for Elorian AI, alongside our friends at Menlo Ventures and Altimeter, supporting AI legends Andrew Dai and Yinfei Yang in building the world's leading solution for multimodal reasoning. Brian Zhan and Max Gazor share more below on our thesis.AI Mastered Language. The Harder Problem Was Always Vision.AI Mastered Language. The Harder Problem Was Always Vision.Striker Venture Partners
Experience
Education
View Romal’s full profile
-
See who you know in common
-
Get introduced
-
Contact Romal directly
Other similar profiles
Explore more posts
-
Luke Simon
Meta • 2K followers
History is repeating itself. 10 years ago the recsys industry was migrating from linear model recsys to sparse neural network recsys. Back in 2016, many people in the field were skeptical that deep learning would deliver positive ROI for recsys, the fear being rooted in the perceived higher training and inference cost. I see people repeat the same fear, uncertainty, and doubt about this year's migration from sparse neural network recsys to LLM-native recsys. Those who do not lean into the LLM-recsys migration this year will fall behind.
78
1 Comment -
LangChain
512K followers
🧠💬 Memory in LLMs A practical guide showing how to implement conversational memory in LLMs using LangGraph, demonstrated through a therapy chatbot. Features code examples for basic retention, trimming, and summarization approaches. Learn to build memory-aware apps 👉 https://lnkd.in/gybcrV5v
967
21 Comments -
Yizhe Zhang
Apple • 4K followers
We (w/ Shansan Gong, Ruixiang ZHANG, Huangjie Zheng, Jiatao Gu, Navdeep Jaitly, Lingpeng Kong) released a family of 7B diffusion language models, DiffuCoder, that specializes on code generation, with a focus on understanding and improving masked diffusion models. A core analysis of DiffuCoder is the autoregressiveness (AR-ness) score, a novel metric that quantifies the causal patterns in decoding, revealing how diffusion models break from strict left-to-right generation for more flexible, non-linear code planning. Recent advances in autoregressive (AR) models dominate code generation, but diffusion-based LLMs (dLLMs) like DiffuCoder offer a promising alternative, especially for complex programming tasks. DiffuCoder explores how these models decode differently—showing less global AR-ness in code tasks compared to math—and how temperature affects both token selection and generation order, unlike traditional AR models. We also introduce coupled-GRPO, a post-training RL method with a coupled-sampling scheme, to reduce performance drops during accelerated decoding, boosting parallelism and efficiency. We use a self-improvement pipeline that leverages AR-ness analysis, coupled-GRPO optimization, and evaluation on benchmarks like AceCode-89k to refine decoding strategies. This approach enables DiffuCoder to navigate diverse code generation pathways and enhance performance with modest computational overhead. Looking ahead, we aim to further leverage Reinforcement Learning to steer code generation through these decoding patterns, with the discrete nature of AR-ness scores providing a foundation for search-based strategies—ideal for the sparse rewards of optimizing complex code structures. Check out our full paper and code for a deeper dive! Paper: https://lnkd.in/gVWU3BDJ Code: https://lnkd.in/gmXTZ_6n Models: https://lnkd.in/gTcKCDr9 #MachineLearning #AI #CodeGeneration #DiffusionModels #NLP
220
5 Comments -
Eitan Anzenberg, PhD
Eightfold • 3K followers
Out team just posted our latest paper “Evaluating the Promise & Pitfalls of LLMs in Hiring Decisions” on arXiv! We found some exciting results: • Benchmarked leading LLMs (GPT-4o, o3, Claude, Gemini, Llama, DeepSeek) against Eightfold’s “Match Score” model on real-world data. • Evaluated both performance (ROC AUC, PR AUC, F1) and fairness (impact-ratio across gender, race, intersectional groups). • Eightfold’s Match Score beat the best LLM on accuracy (ROC AUC 0.85 vs 0.77) and fairness (min race Impact Ratio 0.957 vs 0.809). • Off-the-shelf LLMs still propagate measurable demographic bias without safeguards. • The trade-off between accuracy and fairness is a false dichotomy: carefully engineered, domain-tuned models like Eightfold’s can achieve both accuracy of hiring and fairness of outcomes. https://lnkd.in/guQ2TAYp #machinelearning #ai #eightfold #arxiv #datascience #bias #fairness #ml #data #genai #llms
38
2 Comments -
Nishantha Ruwan
IWROBOTX Software Inc. • 2K followers
A new RL algorithm that fixes a hidden flaw in PPO The authors propose CE-GPPO (“Coordinating Entropy via Gradient-Preserving Policy Optimization”), a variant of PPO that restores gradient contributions from clipped actions in the policy update. They argue that traditional clipping discards useful gradient signals from low-probability tokens, which play an important role in controlling the agent’s entropy during training. By bounding those gradients in a controlled way, CE-GPPO maintains exploration–exploitation balance more stably than prior methods. They provide a theoretical analysis showing that CE-GPPO mitigates entropy instability, and empirically test it on mathematical reasoning benchmarks. Their results indicate consistent improvement over strong baselines across different model scales, demonstrating that preserving clipped gradients can lead to better performance in reinforcement learning for reasoning tasks. https://lnkd.in/gYZsJ-8f
1
-
Ripudaman Singh
Harvey Nash • 28K followers
$1B+ bet on the next frontier of AI: Physical & Spatial Intelligence Today, Yann LeCun’s new startup, AMI - Advanced Machine Intelligence, announced a massive $1.03 billion funding round to challenge the dominance of Large Language Models (LLMs). Current AI is "trapped" in a digital box of text. To reach human-level autonomy, AI needs to understand the physical world. It needs to know that if you push an object, it falls; it needs to navigate a room without a map; it needs to plan complex tasks in real-time. From Digital to Physical: AI is moving into robotics, smart glasses (like Ray-Ban Meta), and industrial automation. From Prediction to Planning: Moving away from probabilistic guessing toward goal-oriented reasoning. The "Ami" (Friend) Approach: Building AI that is controllable, safe, and grounded in reality. The next frontier isn't just "generative"—it’s spatial and physical. We are moving from AI that talks to us, to AI that works alongside us in the physical world. Congrats to the AMI team on this milestone. The era of "World Models" is here. https://lnkd.in/gE99YfGJ #AI #Robotics #MachineLearning #YannLeCun #AMI #Innovation #SpatialIntelligence #TechNews
2
-
Julius Kusuma
Meta • 3K followers
We developed an open-source AI tool to design concrete mixes that are stronger, more sustainable, and ready to build with faster—speeding up construction while reducing environmental impact. https://lnkd.in/gPCk8tCM But the impact of this AI tool is not just hypothetical! Amrize used Meta’s AI-based technologies to design a new low-carbon mix, and successfully deployed it in an at-scale slab-on-grade application at Meta's new data center in Rosemont, MN. Compared to the legacy mix, this new AI-designed mix is: 🦁 Stronger ⏱ Faster 🍃 Lower carbon ⏱ ️The ideal set time All this was achieved without needing any new materials, nor special equipment. Best of all, the AI is open-sourced. https://lnkd.in/g2KA7KZW This work was featured in a Meta engineering blog article published today! https://lnkd.in/gBU9HY8H
98
9 Comments -
Resume Captain
121 followers
3 Career Lessons from a Senior Machine Learning Engineer Role at Waymo The Waymo senior ML engineer position focused on LLM and VLM visual reasoning showcases how cutting‑edge AI intersects with autonomous driving. Here are three universal takeaways anyone can apply: 1️⃣ Highlight Emerging Tech Mastery – When a role demands the newest models, frame your experience with buzzworthy tools as concrete projects. Show how you built, tuned, or deployed those models to solve real problems. 2️⃣ Bridge Research and Product Impact – Employers love innovators who can turn experiments into shipping features. Describe any prototype that moved from notebook to production and the measurable benefit it delivered. 3️⃣ Quantify Multimodal Success – Vision‑language work thrives on data. On your resume and in interviews, list the size of datasets you handled, accuracy gains, or latency improvements you achieved. Want more high‑impact autonomous‑vehicle positions and detailed insights? ➡️ Check the full curated list here: https://lnkd.in/g6jr4BDD #CareerAdvice #JobSearch #ResumeTips #AIJobs #AutonomousVehicles
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content