Wojciech Galuba
London, England, United Kingdom
2K followers
500+ connections
View mutual connections with Wojciech
Wojciech can introduce you to 10+ people at Cohere
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Wojciech
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Activity
2K followers
-
Wojciech Galuba posted thisAnyone at AIE London? Let’s catch up!
-
Wojciech Galuba shared this👉 Join us at Cohere's Data & Evals team! We build RL gyms that simulate knowledge work across multiple industries. We then drop our AI agents into those gyms to evaluate and train them. Agents get smarter and gyms get more challenging - in a loop. Then we ship the agents and watch them make everyone more productive. We are looking for research engineers with a diverse mix of SWE, ML and Data skills to help us build this. Lots of exciting new problems across the stack. Hiring globally & remote friendly: https://lnkd.in/e_UA7ePk
-
Wojciech Galuba shared thisTo all the impacted at FAIR and GenAI - thinking of you. Come and have a fresh start with us advancing the science of evals and posttrain data. Check out the opportunities below. DMs open - always happy to reconnect and help however I can! https://lnkd.in/eeaYwziT https://lnkd.in/e_UA7ePk https://lnkd.in/ePDZvXK4
-
Wojciech Galuba shared thisWe are growing our Evals team at Cohere: https://lnkd.in/e_UA7ePk AI evals are absolutely core to understanding AI's impact across businesses and giving AI models a clear measurable direction to follow. Building evals is an exciting creative process that combines multiple areas: machine learning, AI engineering, data annotation and data science to drive decision making. Come join us!
-
Wojciech Galuba shared thisSuper proud to share what we've been cooking with the amazing team at Cohere - a nimble chat model that's efficient, retrieves docs and gives citations, knows how to use tools and supports 10 languages: https://lnkd.in/eqTVyXyu Ready to use in your business! You can have a chat with it at: http://coral.cohere.com On top of that, thanks to Cohere For AI the model weights are out as well: https://lnkd.in/e-g8utqY Proud to be at a place that can contribute to the research community in such a big way! P.S.: We are hiring in our Data & Evals team, London, Toronto, NY or remote, DM me!
-
Wojciech Galuba shared thisWant to teach LLMs to search the Web? We are looking for a talented leader to head our annotation operations in London.Wojciech Galuba shared this
-
Wojciech Galuba reposted thisWojciech Galuba reposted this🇬🇧Cohere is hiring AI Data Trainers in London, I’ll let Coral do the sell! Apply here > https://lnkd.in/g-XjJCkF
-
Wojciech Galuba shared thisThank you for the invite, Gathers! Looking forward to the event and hoping to see some of you there, LinkedIn community!Wojciech Galuba shared thisJoin our exciting Papers Club event to explore the world of Large Language Models (LLMs) and their application with tools! 🚀📚 In this session, our moderator Wojciech Galuba, an expert in LLMs and data-centric NLP, will guide the discussion on "Large Language Models using tools." Discover cutting-edge research papers, dive into open-source projects, explore plugins, tools, and cool applications related to LLMs. Engage in breakout room discussions and participate in the Q&A session. Don't miss out on this enlightening experience! 🗓️✨ #LLM #LargeLanguageModels
-
Wojciech Galuba shared thisJoin me at next week's talk, where I'll be sharing some insights and career tips on the role of an AI Research Engineer https://lnkd.in/egbxdarEAI Research Engineer - A Multi-disciplinary Role - Codementor EventsAI Research Engineer - A Multi-disciplinary Role - Codementor Events
-
Wojciech Galuba reacted on thisWojciech Galuba reacted on thisAfter almost two years, I've wrapped up my time at Cohere leading the human data team for code annotations. Someone at the farewell drinks asked me my biggest takeaway, and here it goes. Having managed 110 in-house annotators, I've learned that annotation works best when it's treated as a research input, not a cost center. Plug it into every stage of the model and product development loop. Empower the in-house team to find where the model and the product fail, and turn those failures into datasets that actually solve the problem. When the annotation team works as thought partners with the researchers in all teams, the leverage of human data expands end to end, touching synth, evals, SFT and RLHF, and product feedback loops. Realizing this also meant scaling the team with sharper eyes, not more bodies. That's the lesson I'm taking forward, and the one I want to keep leaning into. Huge thanks to the human data team: Wojciech Galuba, Claire Cheng, Dennis Padilla, Brenda Malacara, Max Nisbeth, Jennifer Tracey, Olivia Lasche, Shauna Nehra, also the amazing stakeholders I've loved working with: Ahmet Üstün, Matthias Gallé, Sophia Althammer, Tom Sherborne, Dennis Aumiller, Jay Alammar, and Jesse Willman. And mostly, I'm thinking about the annotation team I built across UK and Canada. A genuinely unusual mix of backgrounds and skill sets across data science to software engineering, especially the ones who treated annotation as the craft it actually is. I learned so much from all of you. More soon on what's next!
-
Wojciech Galuba liked thisWojciech Galuba liked thisLast week we hosted Alec Barber from OpenAI for a discussion on evals, where we covered a lot of ground: - Eval platforms vs building your own harness - Who should write evals in your company - Training judge models and measuring regression and more. Thanks to everyone who came that made it such a good discussion! Wojciech Galuba David Gelberg Roman Engeler Ingmar Klein Shreyas Pulle Raza Hassan Nestor Dubnevych Lukas Koebis Harry Coppock Felix Brockmeier Aidan Davies Daniel Woloch https://lnkd.in/eYR-UPy5
-
Wojciech Galuba reacted on thisWojciech Galuba reacted on thisAs ICLR kicks off in Brazil, we're super excited to host an in-depth discussion on RL: IRL together with Project A and Aptura (formerly Mentis AI) here in London. Although Reinforcement Learning has led to foundational superhuman progress like AlphaGo and AlphaZero, there are still many questions remaining: how do you define RL, especially in the context of post training? how it's applied to open-ended problems? how does it scale? ... We're in good company to learn from the discussion today, moderated by Armin Schöpf (CTO and cofounder of Aptura) with: Tim Rocktäschel - Professor at UCL, former Open-Endedness Lead at Google DeepMind Wojciech Galuba - Data & Evals Lead at Cohere Ian Osband - Research Scientist at Google DeepMind cc: Daria Gneusheva, Zoe (Ziwen) Qin
-
Wojciech Galuba liked thisWojciech Galuba liked thisWent to RL:IRL hosted by Project A & AIEngine. Stacked lineup. Tim Rocktäschel (Meta) on open-endedness Ian Osband (Google DeepMind) on RL fundamentals Wojciech Galuba (Cohere) on envs and evals Key takeaways: Most modern RL is REINFORCE style gradient learning Optimize toward reward until convergence But that is just one slice of RL Not that different from supervised learning like SFT What actually defines RL: Partial updates like SGD Delayed consequences Core tension: exploration vs exploitation REINFORCE struggles here. It often collapses into local optima Bigger issue is environments Current RL envs are too compressed compared to reality They limit the action space and generalization Better envs lead to better agents Finally reward modeling Most rewards are static and proxy based Models learn to reward hack instead of actually learning Feels like progress in RL is not just better algorithms It is better environments, algorithms and better reward signals Shoutout to Daria Gneusheva and Zoe (Ziwen) Qin for hosting this event, was amazing!
-
Wojciech Galuba reacted on thisWojciech Galuba reacted on thisBig announcement: I'm releasing a new book! 📙📙📙 Over the years, I've given a lot of talks and advised many students on how to build a career in AI. It's a topic I'm genuinely passionate about because it goes beyond the technical side. It's a bit cheesy to say, but it does have a real impact on people's lives. Some of the most rewarding messages I get are from readers of my articles, books, and courses telling me how they helped them land a job in this industry. It's a great feeling! So when Packt reached out about writing a book on building a career in AI, it was an immediate yes. I co-authored it with Ali Arsanjani, PhD, Sadid Hasan, Andreas Horn, and Leonid K., bringing perspectives from Google, Microsoft, IBM, and Liquid AI. The Generative AI Career Masterplan is a practical guide for anyone navigating the Agentic AI era: students entering the job market, professionals transitioning into AI, or leaders driving AI transformation. What's inside: → Gen AI's impact on the job market and the core concepts you actually need to know → A breakdown of AI roles, leveling matrices, and how to build a roadmap that fits your background → The full Gen AI stack: RAG, vector databases, function calling, multi-agent systems, LLMOps → Hands-on coverage of LangChain, LlamaIndex, Haystack, and Hugging Face → The human skills that compound in an AI-augmented world: critical thinking, adaptability, creativity, ethical judgment → Personal branding frameworks (the 5-Layer LinkedIn Strategy, the Portfolio Showcase Framework) and a lifelong upskilling roadmap → Real-world use cases across finance, healthcare, software, and marketing You can pre-order it now and it will be released on June 9, 2026. Link in the comments. Would love to hear what you think!
-
Wojciech Galuba liked thisWojciech Galuba liked thisHosting a discussion on reinforcement learning on the 20th of April with our friends at AIEngine and Aptura (formerly Mentis AI). In conversation with: Tim Rocktäschel - Professor at UCL, former Open-Endedness Lead at Google DeepMind Wojciech Galuba - Data & Evals Lead at Cohere, ex Meta/FAIR Ian Osband - Research Scientist at Google DeepMind, ex OpenAI Armin Schöpf - CTO and Co-Founder at Aptura Link to register in the comments.
Experience
Education
Publications
Recommendations received
1 person has recommended Wojciech
Join now to viewView Wojciech’s full profile
-
See who you know in common
-
Get introduced
-
Contact Wojciech directly
Other similar profiles
Explore more posts
-
Sam McCormick
Senior data scientist with 5+… • 576 followers
Very proud to have worked on developing the first open source modelling suite for Marketing Mix Models (MMM), which we at Mutinex hope will lay the groundwork for a more democratic and transparent MMM landscape. The new validation suite enables practitioners to rigorously test and compare MMM models in a consistent, open framework. It's a step toward greater accountability, better standards, and shared progress in marketing science. We’re excited to share it with the community - contributions and feedback are more than welcome! https://lnkd.in/dmWgKUd2 https://lnkd.in/dWJPtr2m
10
2 Comments -
Francisco M. Tacoa, MSE, MBA
Doublefin • 1K followers
My LinkedIn feed has recently been buzzing with posts about “AI frontier models getting better and cheaper.” I would agree that recent frontier LLMs have better performance as measured by benchmarks like GPQA Diamond Score. This is driven by the AI industry’s pursuit of AGI... Now, even though frontier AI model are also getting cheaper on a per Dollars per Million Tokens basis, there is growing evidence in the industry that those of us on the receiving end (those of us using LLMs to build AI-powered applications such as AI-powered coding apps, document summarizers, agentic systems, etc.) are actually paying more for the LLM API calls that enable those AI-powered apps than we did 1 or 2 years ago... Recent frontier AI models are using more and more "reasoning", which means that, despite a drop in cost per Million Tokens, they are re-running queries to double check their answers, going out to the web to gather extra data, and even writing their own routines to do calculations, all before returning an answer to the user that could be as short as a sentence. They can provide better responses now, but can spend a lot more tokens in the process of doing so... This has resulted in a significant cost increase in a number of applications for those of us on the receiving end. Take a look at this: https://lnkd.in/gRf4vPvu
-
Sonia Israel
Nurau • 3K followers
Been tinkering with synthetic data for our llm pipelines at Nurau, where one of our biggest challenges has always been making it feel more human- not just statistically plausible and in avoiding uncanny valleys, but psychologically and linguistically aligned with the real people using our tools. I recently stumbled upon PILOT, a framework for steering synthetic data generation in llms using structured psycholinguistic profiles. It kicks off by translating natural language personas like "anxious introvert" or "confident expert" into multidimensional profiles covering traits like emotional tone, readability, and lexical diversity. From there, it guides the model via three methods: plain persona prompts, schema-based steering (think rigid templates for consistency), or a hybrid that mixes both. The result? Schema steering slashed repetitive phrasing by up to 40% and boosted coherence scores, while the hybrid nailed a sweet spot for diverse but controlled responses. Expert evals yielded high marks on quality, though it does trade some conciseness for richer vocab (there ought to be slider for pedantry 😅 ). In a world where synthetic data is dodging regulatory headaches and data scarcity (it's exploding in healthcare sims and financial modelling), this means more reliable training fodder that mirrors human norms, with correlations to gold-standard human judgments hitting 0.9. But, here's the thing: while PILOT may shine on behavioural control, without human oversight, we risk outputs that sound right but miss cultural nuances or ethical edges. Its implementation evidently still needs solid governance to build trust. At Nurau, we're exploring how this folds into continuous learning loops for production llms, making synthetic data a true partner in ethical AI scaling. you can check out the article here: https://lnkd.in/eeTqkQz2
34
2 Comments -
Andrei Lopatenko
Govini • 26K followers
Document-centric tasks sit at the core of many enterprise, business, and government workflows. Search is important, but the real challenge is going beyond retrieval, enabling systems to reason across documents, verify facts, and handle multi-step information tasks. Great to see a new model from Databricks moving in this direction. I expect we’ll see many more models, including open-weight ones, designed specifically for document-driven workflows, an area with huge potential across enterprise and government use cases.
23
1 Comment -
Brad Hutchings
UC Irvine • 2K followers
#MeWriting The fundamental issue driving RAM availability and prices right now is an unwarranted belief that applying more compute to LLM training or inference makes them better. We are past any reasonable point of diminishing returns on scale up or scale out with LLMs. But we're pretending like those returns will be super-linear well into the future. For two years now, going back to the availability of Mistral 7B Instruct v0.2 and the PrivateGPT stack to run it, I have said that small LLMs feel as knowledgeable as the big ones, they just aren't as annoyingly loquacious. Gemma 4B as a daily driver first stop, running on a meager Raspberry Pi 5 without a GPU, does what I need in 19/20 text cuing cases. Probably because I know what LLMs work well for and don't work well for. If I'm running that Pi 24/7 at full CPU inferencing power, it costs me $4/month at ridiculously high SoCal standard residential electricity rates. Cheaper now that I'm in Nevada and have solar during the day. A completion ("answer") spits out faster than I can read along. Do I need two pages generated in 5 seconds? Here is the point. When our demand for LLM compute pares back to what we actually need to run the algorithms and give useful results — when we are actually "efficient" — we won't need so much cloud infrastructure and centralized stored of RAM, GPU, storage, and the like. Prices of the pieces will collapse. Ironically, Grok helped me illustrate this concept using $1M worth of cloud compute hardware for a few microseconds.
1
-
Carlos de Segovia
London Business School • 3K followers
Excited to see LMEval, a new open-source framework for accurately and efficiently comparing large models. Giskard benchmark's Phare (phare.giskard.ai) leverages LMEval for safety and security. #AI #LLM #OSS Elie Bursztein Matteo Dora Alex Combessie 🐢 Kurt Thomas Guillaume Sibout
4
-
Dan Porder
Pluck • 1K followers
I went down a bit of a rabbit hole on AI spreadsheet ingestion this month, as I continue improving our data pipelines at Valae. There’s a lot of research and products in this space that don’t match what we see inside companies every day. Papers like Microsoft Research’s SpreadsheetLLM achieve high metrics for table extraction, but there’s always a catch. The success is measured on spreadsheets where the data is already pretty clear. Real company spreadsheets aren’t so easy. Sometimes they have three tables crammed into one tab, column headers that are someone's initials, and cell B7 containing "check with Marco re: pricing (NOT final)". In my research, I also dug into the survey "Toward Real-World Table Agents" (Tian et al., 2025) which confirms what I found after reading dozens of these papers: almost all LLM table research uses clean academic datasets. The messy reality of business data is barely ever addressed. That gap is where my team at Valae lives. We're perfecting proprietary processes for turning the messiest possible spreadsheets into clean, machine-readable knowledge, automatically. Despite SpreadsheetLLM, despite LlamaIndex launching LlamaSheets, despite years of research… nothing on the market truly handles the gap between what a spreadsheet contains and what it means. Encoding is a solved-ish problem. Interpretation isn't. And that's the one that matters. https://lnkd.in/er7PdDBP
16
1 Comment -
Vijay Ramachandran
KLA • 3K followers
Regarding the AI disruption of SW freakout - I am interested in knowing if there is demand/supply based market analysis of this? If so, please do comment and let me know. The current repricing of SW seems to assume SW is demand constrained?? - i.e. the same amount of SW can be generated much cheaper, and thus software vendors will face pricing pressure and competition. But what if SW is supply constrained? Why not a scenario where 10 times as much software can be sold for 2 times as much revenue using the same number of software engineers? Considering the premium multiple that SW companies have commanded, and the high salaries of SW engineers, I would guess that the market is more likely to be supply constrained.
22
4 Comments -
Emmanuel Schizas
RegGenome • 2K followers
Max Ashton-Lelliott wrote a brilliant paper on the power of structured data as an input into LLMs and agentic AI. I've covered this elsewhere but it's worth re-iterating, without adequate structured inputs you can't really get these tools into compliance workflows - not while keeping the economics of it all reasonable. And I may be speaking out of turn, but this isn't even about RegGenome's data (the best) - it's just madness to hope that a) these models will somehow be super-efficient while trying to digest what are essentially image files and b) regulators the world over will one day collectively start publishing everything in neatly packaged XML. This is where people like us come in.
18
3 Comments -
Mikael Rashid
noDevBuild • 1K followers
We’ve been experimenting with something interesting in one of our products recently. Instead of going down the usual Vector DB + embeddings route for retrieval, we’re exploring a page indexing approach. The idea is simple: rather than embedding everything and relying purely on similarity search, we index information in a more structured way at the page level and retrieve it deterministically when needed. If you are someone working in this domain or building something interesting around PageIndexing , would love to have a chat and connect with you. My DMs are always open and you can also write to us at contact@nodevbuild.com Let's Build! Ashish Kumar Faiz Anwar
15
-
Neha Sheth
Motorola Solutions • 2K followers
There’s a lot of momentum around agentic AI right now. 📌 In practice, many systems labeled “agents” are really workflow automation with LLMs and tool calling. That’s often the right solution - workflows are predictable, testable, and deliver value quickly. Problems show up when we deploy these systems expecting agent-level behavior. Workflows are optimized for correctness on known paths. Agents exist to recover when the path is wrong. If a system can’t recognize failure, update state, and re-plan its approach, autonomy won’t magically appear - no matter how many tools or prompt chains we add. I wrote a deeper breakdown (with diagrams + concrete examples) on Substack 👉 Curious how others are drawing this line in production systems.
29
4 Comments -
Shiv Trisal
Databricks • 8K followers
This is why I love working at Databricks. Innovation here starts from first principles (and it’s never easy). Our product and R&D teams build “Useful AI” capabilities that enable organizations to generate real alpha, not just add to agentic hype. Add KARL to the list!
40
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore More