Yun Jin
Fremont, California, United States
7K followers
500+ connections
View mutual connections with Yun
Yun can introduce you to 10+ people at Fireworks AI
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Yun
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Websites
- My MSDN blogs
-
http://blogs.msdn.com/yunjin
- HPC site including my blogs
-
http://windowshpc.net
About
Technologist at heart. Always learning, always building — compilers, distributed systems,…
Articles by Yun
-
The 72-Hour Battle Behind Fireworks’ DeepSeek-V4 Release
The 72-Hour Battle Behind Fireworks’ DeepSeek-V4 Release
At Fireworks AI , we pride ourselves on providing Day-0 support for every SOTA open-source model release. For…
75
-
Stop Re-Deriving Physics: AI Cheat Sheet from DeepSeek’s EngramJan 19, 2026
Stop Re-Deriving Physics: AI Cheat Sheet from DeepSeek’s Engram
Scaling laws have long been the North Star of Generative AI. Traditionally, we talk about scaling in terms of…
59
2 Comments -
Why Data Centers Just Stole the Show at CES 2026Jan 7, 2026
Why Data Centers Just Stole the Show at CES 2026
For decades, #CES was the playground of game consoles, drones, and smart gadgets. But this year, the most consequential…
45
1 Comment -
CUDA Tile and the Reflection of C = A X BDec 29, 2025
CUDA Tile and the Reflection of C = A X B
Having worked in parallelized computing for years, I’m thrilled to see NVIDIA’s recent launch of CUDA Tile programming…
57
-
The Tale of Two Infrastructures: Why RecSys and LLMs Are Engineering Worlds ApartDec 1, 2025
The Tale of Two Infrastructures: Why RecSys and LLMs Are Engineering Worlds Apart
Before ChatGPT, my personal experience with AI systems was dominated by recommendation systems. After all, they…
56
1 Comment -
Disaster Resilience in the AI eraOct 21, 2025
Disaster Resilience in the AI era
Today, as a major AWS outage in us-east-1 took down a significant portion of the internet, the engineering team at…
84
1 Comment -
LLM is an Art, Building Agents is EngineeringAug 25, 2025
LLM is an Art, Building Agents is Engineering
The rapid rise of large language models (LLMs) has captivated the world with their creativity and generative power…
40
1 Comment -
Fine tuning LLM for quality, speed, and efficiencyAug 4, 2025
Fine tuning LLM for quality, speed, and efficiency
As I explained in a previous post, large language models (LLMs)—even with their impressive raw intelligence—lack the…
27
3 Comments -
The Co-Pilot's Co-Pilot: Why LLMs Need a New Application ParadigmJun 17, 2025
The Co-Pilot's Co-Pilot: Why LLMs Need a New Application Paradigm
Large Language Models (LLMs) have rapidly become indispensable co-pilots in our digital lives, adept at tasks ranging…
36
3 Comments -
Meta’s Taiji Traffic Management System: Steering Billions of Users Around the Globe for Scalability and PerformanceJun 13, 2025
Meta’s Taiji Traffic Management System: Steering Billions of Users Around the Globe for Scalability and Performance
In today's hyper-connected world, delivering a fast and reliable experience to billions of users requires a worldwide…
32
1 Comment
Activity
7K followers
-
Yun Jin shared thisIn Fireworks we value quality more than GTM speed. For DeepSeek v4 release, we held back release time for 3 days to work with community in resolving a fundamental problem. Here is our story. #ai #artificalintelligence #LLM #MachineLearning #genai #SOTA #OSS #Deepseek #FireworksThe 72-Hour Battle Behind Fireworks’ DeepSeek-V4 ReleaseThe 72-Hour Battle Behind Fireworks’ DeepSeek-V4 ReleaseYun Jin
-
Yun Jin shared thisI led the Apsara system development when Alibaba Cloud was founded. Now I support engineering in Fireworks.AI. I'm impressed by talents, dedication, and strong execution from both teams. Now a well-established cloud company, AliCloud innovates to be a frontier AI lab; as a new comer, Fireworks democratizes cutting-edge AI to every developer with fine-tuning and highly efficient inference. The partnership is perfect and we will do awesome things together! #ai #artificialintelligence #genai #llm #qwen #fireworksYun Jin shared this📢 Official Announcement: Qwen Partners with Fireworks AI to Accelerate Access to Qwen Family Models We are pleased to announce a strategic partnership between Qwen and Fireworks AI to deliver optimized, production-ready deployment of Qwen's closed weights models via the Fireworks Platform. This collaboration empowers developers and enterprises to: ✅ Deploy Qwen models with lower latency and reduced fine tuning and inference costs ✅ Leverage enterprise-grade reliability, security, and scalability ✅ Integrate seamlessly into modern AI workflows 🔹 Get started with Qwen on Fireworks: https://lnkd.in/gaq8WzXD #Qwen #FireworksAI #OpenSourceAI #LLM #AIInfrastructure #ResponsibleAI #DeveloperCommunity
-
Yun Jin shared thisBeing recognized as one of the most promising private AI companies in the world is a massive milestone, but the consistency and foudamentals behind the ranking is what makes Fireworks truly unique in the crowded space. Here are ways Fireworks is redefining the leaders in GenAI infra: ✨ Efficiency as a Superpower: Fireworks has reached a $4B valuation with just ~$330M in total funding. We’ve raised fewer rounds and asked for significantly less capital than similar AI infra players, proving that technical ingenuity can outperform raw spending. 📈 The Definition of "Young and Mighty": Founded only in 2022, Fireworks is among the youngest infra companies on the list. The trajectory is steeper and more robust. 🏆 Consistency in Excellence: Forbes' Next Billion-Dollar Startups, Cloud 100, and now the AI 50, our team has consistently been recognized by the industry’s most prestigious lists since our inception. This recognition belongs to our incredible team who took a bet on GenAI before the potential has been realized. We are building the foundational layer for the next generation of AI agents—and we’re just getting started. 🚀 Want to help us build the fastest inference engine on the planet? We are hiring across research, engineering, and product (https://lnkd.in/g_Kg_fwX)! Check out the full Forbes AI 50 list here: https://lnkd.in/gatqNQtZ #ForbesAI50 #GenAI #AIInference #FireworksAI #Startups #MachineLearningYun Jin shared thisForbes released its eighth annual AI 50 list, which spotlights some of the most prominent privately held AI companies. Congratulations to the teams at Lightspeed-backed Abridge, Anthropic, Cyera, Databricks, ElevenLabs, Fireworks AI, Glean, Mistral AI, Reflection, Skild AI, Suno, and Thinking Machines Lab on the recognition. Check out the full list in the comments section.
-
Yun Jin shared thisWith the release of Kimi-K2.6, we are seeing OSS SOTA models that rival the quality of top-tier proprietary models like Opus 4.6. At Fireworks AI, we are proud to provide immediate, 0-day support for this powerhouse. Our partners and customers are highlighting their co-launch with Fireworks: 🚀 Moonshot: https://lnkd.in/guq6UxqY 🤖 Factory AI: https://lnkd.in/eY_example (https://lnkd.in/gC3mEceC) ✍️ Notion: https://lnkd.in/gmabgfZD Why Fireworks? 🛠️ This isn’t just about being "first." Our team has made a long-term technical bet to make this possible: ✨ Hardware-Software Codesign: Optimizing every layer of the stack for maximum throughput and minimum latency. 🔬 Cutting-edge Inference Research: Implementing the latest techniques to ensure SOTA models run at production scale. 🧠 Agentic Quality: Rigorous improvements in long-horizon reasoning and precise tool use to support the next generation of AI agents. Join Us! 🎆 We are building the fastest, most reliable home for the world’s best models. If you have a passion for GenAI and want to work at the intersection of deep research and high-performance engineering, we are hiring. #GenAI #ArtificialIntelligence #LLM #Inference #FireworksAI #MachineLearningYun Jin shared thisKimi K2.6 is live on Fireworks as a Day-0 launch partner! K2.5 was the base for standout models like Cursor’s Composer 2 and was the most popular model on our training platform. K2.6 on Fireworks raises the bar again. → Optimized across the stack, from custom speculators to heterogeneous hardware support across NVIDIA and AMD. → Day-0 serverless support is live, and coming soon to Fire Pass (stay tuned). → Unlock new use cases with capabilities for 12+ hour autonomous runs and 4,000+ tool calls. → $0.95 input / $4.00 output per 1M tok Get started today → https://lnkd.in/gqXXpZrg
-
Yun Jin reposted thisYun Jin reposted this"The future is not one model that rules all. The future is millions of models, one per application, one per use case. You should have your own model." Our CEO Lin Qiao joined Denis Yarats of Perplexity, Bryan Catanzaro of NVIDIA, and Stefan Weitz on the opening panel at HumanX to break down the AI stack: energy, chips, infrastructure, models, and applications. Each layer supports the next, and the pace of buildout across all five is accelerating. The inference layer is where our vision is realized: every team plugging in their own data, building models tuned to their specific use case, and solving the quality-latency-cost problem required to run AI in production. Thanks to HumanX for having us!
-
Yun Jin shared thisAnother strong release for the open model ecosystem: Gemma 4 is now available on Fireworks AI. What stands out in this release: 🔹 Efficiency per parameter. Gemma 4 pushes capability without relying purely on size. The lineup spans from 2B to 31B (including dense and MoE variants), with a practical tradeoff curve between cost, latency, and quality. 🔹 Tool use. Strong support for function calling and JSON-style outputs makes them easy to integrate into agentic systems. 🔹Multimodality. Text, image, and video processing are built-in. At Fireworks, we’ve focused on making these models actually usable in production with optimized inference stack and flexible deployment options. https://lnkd.in/gpMQcRE7 https://lnkd.in/g5KeAnbs #Gemma #FireworksAI #LLM #AIEngineering #OpenModels #InferenceFireworks AI - Fastest Inference for Generative AIFireworks AI - Fastest Inference for Generative AI
-
Yun Jin shared thisThrilled to share that Qwen-3.6 Plus, the latest flagship model from the Qwen family, is now available exclusively on the Fireworks platform. This model showcases significant improvements in coding performance and features a hybrid architecture that integrates linear attention with sparse mixture-of-experts. This design enables both high-performance inference and long context windows. Explore it here: https://lnkd.in/geTkzG8A #qwen #AI #FireworksAI #LLMFireworks AI - Fastest Inference for Generative AIFireworks AI - Fastest Inference for Generative AI
-
Yun Jin reposted thisYun Jin reposted thisChallenge: Autonomous AI agents are the future, but building them to be truly safe, secure, and efficient at scale remains a major hurdle. Solution: NVIDIA NemoClaw + Fireworks We’re excited to collaborate with NVIDIA on NVIDIA NemoClaw, an open source stack that simplifies running OpenClaw always-on assistants, more safely, with a single command. *Why it Matters for Developers* Simplifies Security: It safely runs always-on OpenClaw assistants with a single command. Secure Runtime: Installs the NVIDIA OpenShell runtime, giving your agents and open models the access they need while enforcing policy-based security, network, and privacy guardrails. We're proud to be a day-0 inference provider for NVIDIA NemoClaw, bringing you the speed and efficiency you need. Deploy your agents on Fireworks today! https://lnkd.in/gFqHBPb2
-
Yun Jin shared thisI'm so proud of our team's hard work to enable Cursor Composer2's training+inference! Besides fastest inference, helping AI developer to train customized models is a key initiative from Fireworks to enable Autonomous Intelligence.Yun Jin shared this🔥 Cursor Composer2 launched on Fireworks 🔥 This time it's not just inference but also RL powered by Fireworks AI. So much hard work and sleepless nights to get this gift out. Congrats Cursor team on launching this SOTA model beating Opus 4.6 on terminal bench! 👉 https://lnkd.in/e7DqxJev
-
Yun Jin liked thisYun Jin liked thisAfter nearly a decade at Facebook/Meta I've decided it is time for something new. Below is an excerpt from a post I shared internal to Meta which I'm sharing here especially for my many former colleagues and collaborators. ___ THANK YOU It has been an amazing honor to work with all of you at Meta! I’m off to my next adventure on May 18th. 10 years ago, my final interviewer for my first role at Facebook asked me to teach them something. I thought for a moment and then shared a practice I learned from my first career training student leaders as a campus minister. I shared that to be an effective public speaker, you should recognize that we are social beings and your emotions will be reflected on the faces of your audience, and their emotions will impact you as you speak. If you are nervous, the audience will feel nervous with you… which in turn will make you more nervous. If you are peaceful or joyful, the audience will reflect those emotions. Therefore, the secret to communicating with confidence and winning over your audience is to look at the people in the audience with sincere warmth and affection. In my faith tradition we describe this as looking at people with love. This practice has served me well, not merely in public speaking but in all my roles at Facebook/Meta. As a product manager, choosing to love the billions of people who are current or prospective users shapes how I think about building products. Even as we do many great big initiatives at Meta, often the transformative impact comes from embracing advice from Mother Teresa – do small things with great love. In my work in connectivity and partner growth, the choice to look at people with love was reflected in our partnerships. Seeing partners with affection and wanting them to succeed has countless times resulted in good will reflected back on me and on Meta. But the place where the practice of looking at people with love has been the easiest and most rewarding is in my daily collaboration with you, my coworkers and colleagues. So many of you work tirelessly to support Meta’s ongoing success and innovation. To all with whom I have had the privilege to work, sincerely, thank you.
-
Yun Jin liked thisYun Jin liked this“3X is for losers” The most memorable quote from a very memorable first QBR as a leadership team. “3x 3x 2x 2x” is a rule of thumb that most companies hope to follow. But that rule just doesn’t work for a space as big as inference or a company as fast as Fireworks AI. Scaling a sales organization when you have the wind at your back is amazing. If you’re curious, independent and operate with urgency - you’ll fit in great here. This team is having fun every day. Come join us! Roberto Barroso-Luque Steven Wu Brenna Shubin Patty Li Chan Manchanda Rob Ferguson Lavinia(Xiaolei) Liu, SHRM-SCP Mark McAndrew Tingting Bi David D. Lee George Hu
-
Yun Jin liked thisWe’ve been working closely with the Harvey team on the launch of the Legal Agent Benchmark, a product focused on evaluating how open-weight models perform on long-horizon, real-world legal tasks. A key focus of this work has been post-training and quality-latency optimization to improve how models perform under real-world deployment constraints. This reflects a broader shift: evaluation is starting to capture the complexity and constraints that determine whether models can actually perform real work, not just succeed on short, static tasks. Learn more at the Harvey blog:Yun Jin liked thisToday we are launching Legal Agent Benchmark (LAB), an open-source benchmark for long-horizon legal agents. Unlike prior benchmarks that measure short-horizon reasoning, LAB is a client matter-centric benchmark built to mirror how legal work is actually delivered at a law firm. Each task gives an agent a partner-level request for work product, a synthetic client matter, and is graded against an expert rubric. In total, LAB spans over 1,200 tasks across 24 legal practice areas, graded by over 75,000 expert rubric criteria. We are working closely with the foundation model labs, inference providers, Neo labs, and AI companies to evaluate the best open and closed source legal agents. Soon, we’ll be releasing a leaderboard covering major closed- and open-source models. We believe law is built on trust and for a benchmark to be truly trusted it needs to be open source. If you are building models or agents for coding, knowledge work, or legal we would love to partner with you. Read more about LAB here: https://lnkd.in/g3S4zvne
-
Yun Jin liked thisThank you for the spotlight The Official Top 100 Magazine!Yun Jin liked thisWe are pleased to announce that Hitesh Jain will be featured in the Top 100 Innovators & Entrepreneurs magazine. Hitesh Jain, founder and CEO of CoralBricks AI, is advancing one of the most critical frontiers in artificial intelligence: memory. With deep expertise in product vision, system architecture, and model development, Hitesh is focused on solving a fundamental limitation in today’s AI systems; their inability to retain and build upon information over time. CoralBricks AI is developing a memory layer that enables intelligent applications and agents to operate with greater continuity, accuracy, and speed, transforming how decisions are made across extended interactions. Unlike traditional AI systems that are constrained by session limits and forget prior context, CoralBricks empowers agents to remember users, revisit past interactions, and carry complex tasks across days or even longer. Whether guiding a hedge fund through a trading decision or supporting enterprise workflows, the platform enables more adaptive, persistent, and intelligent experiences. Designed as scalable infrastructure, CoralBricks integrates seamlessly into environments like AWS and NVIDIA, allowing developers to deploy memory-enabled agents through a robust API. #entrepreneur #publishing #top100magazine #innovator #businessnews #inthenews #business #business #motivation #success #smallbusiness #entrepreneurship #entrepreneurlife #marketing #businessowner #inspiration #startup #digitalmarketing #branding #investment #smallbusinessowner Hitesh Jain
-
Yun Jin reacted on thisYun Jin reacted on thisToday we are launching Legal Agent Benchmark (LAB), an open-source benchmark for long-horizon legal agents. Unlike prior benchmarks that measure short-horizon reasoning, LAB is a client matter-centric benchmark built to mirror how legal work is actually delivered at a law firm. Each task gives an agent a partner-level request for work product, a synthetic client matter, and is graded against an expert rubric. In total, LAB spans over 1,200 tasks across 24 legal practice areas, graded by over 75,000 expert rubric criteria. We are working closely with the foundation model labs, inference providers, Neo labs, and AI companies to evaluate the best open and closed source legal agents. Soon, we’ll be releasing a leaderboard covering major closed- and open-source models. We believe law is built on trust and for a benchmark to be truly trusted it needs to be open source. If you are building models or agents for coding, knowledge work, or legal we would love to partner with you. Read more about LAB here: https://lnkd.in/g3S4zvne
-
Yun Jin liked thisYun Jin liked thisMy talk at AgentCon: “Unit Tests for Model Training: The Missing Layer in Compound AI” Everyone’s talking about compound AI right now. Smaller models, open weights, agents stitching things together. Lin Qiao and other industry leaders have been saying this for a while, and honestly, I’m very on board. But here’s the thing that kept bugging me: If training is now part of product life cycle… where are the tests? What most teams are doing today: → Benchmarks (great, until your problem outgrows them) → LLM-as-judge (basically grading your own homework with extra steps) → Human evals (high quality, just painfully slow) → Vibes (quietly powering more decisions than we’d like to admit ✨) None of these give you that tight, repeatable signal you’d expect from a unit test. That's the gap Eval Protocol (https://lnkd.in/gGeZXxcM) from Fireworks AI is built for. Structured, composable evals that actually plug into the training loop. Honestly, the best part wasn’t the talk. It was the hallway chats. People building real systems, trading notes on evals, and figuring out what actually works… and what really doesn’t 💁🏻♀️
-
Yun Jin liked thisYun Jin liked thisWe’re hiring across all GTM functions! #gtm #ai #fireworks I’m looking for: technically curious, intellectual, and consultative AEs who operate with high agency. If you were an SE previously that’s a plus! Reach out to Zoe Z., Lavinia(Xiaolei) Liu, SHRM-SCP or visit: https://lnkd.in/gzGQw4ni for more info.
-
Yun Jin liked thisExcited to launch RadixArk officially today! I have spent the past half year working with RadixArk and the SGLang community, and it has been the most rewarding experience I have had. RadixArk, and the SGLang community, has a very unique engineering culture. The code and the system have the final say. Feedback is direct because everyone trusts the intent. There is very little hierarchy around ideas, and good technical judgment matters more than title or seniority. With a high bar and fast feedback loops, people grow incredibly quickly. In many places, you spend most of your time looking at one company’s stack. Here, through SGLang community, we get to see the forest, not just the trees: many labs, companies, hardware platforms, workloads, and real production systems. There is a lot of exciting work ahead across inference, training, RL, orchestration, kernels, multi-hardware, and many real-world systems problems in between. If you love coding, enjoy building real systems, and want to work on the full AI stack from inference to training, come join us at RadixArk. This is just the beginning.Yun Jin liked thisToday, we are thrilled to officially launch RadixArk with $100M in Seed funding at a $400M valuation. The round was led by Accel and co-led by Spark Capital. RadixArk exists to make frontier AI infrastructure open and accessible to everyone. Today, the systems behind the most capable AI models are concentrated in a small number of companies. As a result, most AI teams are forced to rebuild training and inference stacks from scratch, duplicating the same infrastructure work instead of focusing on new models, products, and ideas. RadixArk was founded to change that. We are building an AI platform that makes it easier for teams to train and serve the best models at scale. RadixArk comes from the open-source community. We started with SGLang, where many of us are core developers and maintainers, and expanded our work to Miles for large-scale RL and post-training. We will continue contributing to both projects and working with the community to make them the strongest open-source infrastructure foundations for frontier AI. We would like to thank our long-term partners, contributors, and the broader SGLang community for believing in this mission. We’re also grateful to Accel, Spark Capital, NVentures (Venture capital arm of NVIDIA), Salience Capital, A&E Investments, HOF Capital, Walden Catalyst Ventures, AMD, LDV Partners, WTT Fubon Family, MediaTek, Vocal Ventures, Sky9 Capital and our angel investors Igor Babuschkin, Lip-Bu Tan, Hock Tan, Sean Ma, John Schulman, Soumith Chintala, Lilian Weng, Olivier Pomel, Thomas Wolf, William (Liam) Fedus, Robert Nishihara, Eric Zelikman, Logan Kilpatrick, Divesh Makan, Mei Z., Puneet Kumar, Brian Cayne, Howie Xu, Aditya Grover, Yuandong Tian, Beidi Chen, Guodong Zhang, and Hao Zhang among others. Thanks for the exclusive interview with Meghan Bobrowsky at The Wall Street Journal about our vision. Join our Ecosystem: 1. Star SGLang: https://lnkd.in/gWcU_tTX 2. Explore our open positions and join us: https://lnkd.in/g6gsb-Gd Read the full story on WSJ: https://lnkd.in/gB34eEiE Read the full blog: https://lnkd.in/gS6SAJeN Get started at: https://www.radixark.com/ #AIInfrastructure #SGLang #RadixArk #OpenSource #SeriesSeed #Inference #reinforcelearning
-
Yun Jin liked thisYun Jin liked thisWhile I was in Bengaluru for GitHub Constellation, I sat down with Rahul Regulapati, founder of Galleri5 (now, part of Collective Artists Network). They're building an AI-native cinematic platform, powered by Microsoft Foundry, that helps directors make films faster, better, and cheaper. What stuck with me: Two directors can use the exact same AI models and produce something completely different, so each director's signature comes through. AI doesn't flatten creativity. It expands what's possible and lets directors A/B test shots the way engineers A/B test code. We also talked about what it takes to build as an AI founder right now. Speed of learning, and being comfortable with being uncomfortable, are everything.
Experience
Publications
-
Dynamo: Facebook’s Data Center-Wide Power Management System
ISCA 2016 (ACM/IEEE International Symposim on Computer Architecture)
See publicationA data center-wide power management system that monitors the entire power hierarchy and makes coordinated control decisions to safely and efficiently use provisioned data center power.
-
Watching videos from everywhere: a study of the PPTV mobile VoD system.
Internet Measurement Conference 2012
Patents
-
Graphical Programming Object Population User Interface Autogeneration
Filed US 20120227028
-
Scheduling by Growing and Shrinking Resource Allocation
Filed EU EP20080826472
A scheduler for computing resources may periodically analyze running jobs to determine if additional resources may be allocated to the job to help the job finish quicker and may also check if a minimum amount of resources is available to start a waiting job. A job may consist of many tasks that may be defined with parallel or serial relationships between the tasks. At various points during execution, the resource allocation of active jobs may be adjusted to add or remove resources in response…
A scheduler for computing resources may periodically analyze running jobs to determine if additional resources may be allocated to the job to help the job finish quicker and may also check if a minimum amount of resources is available to start a waiting job. A job may consist of many tasks that may be defined with parallel or serial relationships between the tasks. At various points during execution, the resource allocation of active jobs may be adjusted to add or remove resources in response to a priority system. A job may be started with a minimum amount of resources and the resources may be increased and decreased over the life of the job.
Other inventorsSee patent
Languages
-
English
-
-
Chinese
-
View Yun’s full profile
-
See who you know in common
-
Get introduced
-
Contact Yun directly
Other similar profiles
Explore more posts
-
Redis
294K followers
LiteLLM is an open-source proxy that connects your app to LLMs through one unified interface. Paired with Redis, it gives AI and ML teams a simple yet powerful way to unify access to LLMs, accelerate response times, and make AI apps real-time. Redis handles performance, memory, and data coordination that modern AI apps demand, while LiteLLM handles abstraction and routing. Our AI Product Marketing Manager Rini Vasan dives into: ▶️ What LiteLLM is ▶️ Why it works well with Redis ▶️ How LiteLLM and Redis work together Learn how to scale your LLM gateway with LiteLLM & Redis: https://lnkd.in/gqsTebF7
32
-
ADTmag
140 followers
John K. Waters explores how JFrog is tackling one of AI development’s biggest challenges: managing and securing AI-generated code. The company’s new platform, JFrog Fly, is designed for “agentic workflows,” automating repository setup, versioning, and integration with tools like GitHub Copilot and Anthropic’s Claude Code. Read how JFrog Fly aims to bridge the gap between AI coding and real-world deployment: https://lnkd.in/eJFJkjvD #AI #DevOps #SoftwareDevelopment #JFrog #AICoding
2
-
AITech365
4K followers
JFrog Powers AI Developer Workflows with MCP Server “The developer tool stack and product architecture has fundamentally changed in the AI era. With the launch of the JFrog MCP Server, we’re expanding the open integration capabilities of the JFrog Platform to seamlessly connect with LLMs and agentic tools,” said Yoav Landman, Co-Founder and CTO, JFrog. “This allows developers to natively integrate their MCP-enabled AI tools and coding agents with our Platform, enabling self-service AI across the entire development lifecycle, which helps increase productivity and build smarter, more secure applications faster.” Read More: https://lnkd.in/dWVcrTPf #AITech365 #ITandDevOps #JFrog #JFrogPlatform #LLMs #MCP #news #SoftwareSupplyChainPlatform
16
-
KubeFM
7K followers
Julia Blase, Product Manager at Chronosphere, discusses how observability needs to evolve alongside Kubernetes. She explains that Kubernetes brought dynamic, scalable infrastructure, and monitoring systems must follow suit. The future requires on-demand instrumentation and dynamic data collection that can scale up and down based on needs, helping teams optimize costs while getting the right insights at the right time. Watch the full interview: https://ku.bz/X6tgWrG0P
2
-
SogetiLabs
10K followers
How Large Language Models Are Making Software Greener AI is quietly driving sustainability. Large Language Models (LLMs) now go beyond generating text—they’re optimizing code. By detecting inefficiencies and refining logic, they cut CPU cycles, reduce energy use, and lower costs. Cleaner code means greener software. At scale, these improvements shrink carbon footprints, turning LLM-assisted refactoring into an environmental win. Explore how AI is shaping a future where innovation meets sustainability. https://lnkd.in/dfwy_Yhq #AI #SustainableTech #GreenSoftware #CodeOptimization #Innovation Md Siddiqur RAHMAN, Mouna Ben Mabrouk Ha Nhi NGO Ginel Dorleon Azade Fotouhi José Carlos Rosales Núñez Ines BEN KRAIEM
5
-
Opsi Systems
5K followers
We co-authored this new white paper with JBF Consulting exploring what changes when transportation optimization moves beyond CPU-based architectures to GPU-accelerated optimization. One insight stands out clearly: the historic trade-off between optimization runtime, scale, and solution quality isn’t a law of physics - it’s an architectural limitation. In the paper, we go deeper into why CPU-based solvers hit that ceiling, how GPU-accelerated optimization changes the equation, and what that means for real-world, highly constrained transport networks. 👉 Read the full white paper - see the link in the first comment. #SupplyChainTechnology #TransportationOptimization #LogisticsStrategy #GPU #TMS #Chora #Tramm
10
3 Comments -
CAI Stack
858 followers
𝗔𝗜 𝗱𝗶𝗱𝗻’𝘁 𝗯𝗿𝗲𝗮𝗸 𝘆𝗼𝘂𝗿 𝗙𝗶𝗻𝗢𝗽𝘀. 𝗜𝘁 𝗼𝘂𝘁𝗴𝗿𝗲𝘄 𝗶𝘁. Model training can run for hours or weeks. Inference costs change with user behavior. Experiments share GPU infrastructure across teams and timelines, making ownership and accountability unclear. GPU costs are 10–20x higher than standard compute, yet most cost tools only show totals - not intent, impact, or outcomes. Why Enterprise FinOps Needs an AI-Oriented Architecture explains why cost governance breaks under AI workloads and how AI-aware visibility, guardrails, and outcome-based tracking help enterprises regain control before budgets spiral. 𝘙𝘦𝘢𝘥 𝘵𝘩𝘦 𝘧𝘶𝘭𝘭 𝘣𝘭𝘰𝘨 𝘣𝘦𝘭𝘰𝘸 𝘪𝘯 𝘵𝘩𝘦 𝘤𝘰𝘮𝘮𝘦𝘯𝘵𝘴 𝘵𝘰 𝘶𝘯𝘥𝘦𝘳𝘴𝘵𝘢𝘯𝘥 𝘸𝘩𝘢𝘵 𝘍𝘪𝘯𝘖𝘱𝘴 𝘧𝘰𝘳 𝘈𝘐 𝘳𝘦𝘢𝘭𝘭𝘺 𝘳𝘦𝘲𝘶𝘪𝘳𝘦𝘴. #FinOps #AICostManagement #AIInfrastructure #CloudCosts #EnterpriseAI #CAIStack
7
1 Comment -
Hamza Ishaq
SkyBit Technologies • 6K followers
🧠 AI Infrastructure most markets underprice Amazon isn’t renting the future — it’s building it: ⚙️ Trainium & Inferentia for AI training and inference ⚙️ Graviton powering cost-efficient cloud compute ⚙️ Bedrock enabling enterprise-grade generative AI at scale This is AI leverage embedded directly into AWS 📊🤖 ⸻ 📢 Advertising powered by real purchase intent Unlike social platforms guessing intent, Amazon knows it: 🛒 Amazon DSP 🛒 Retail Media Network Every ad is tied to actual buying behavior, not just clicks — a data moat most ad platforms can’t replicate 🔒 ⸻ 🚚 Logistics as a Service Amazon is quietly becoming the backbone of global commerce: 📦 FBA 📦 Buy with Prime 📦 Amazon Freight Merchants don’t just sell on Amazon anymore — they run on Amazon 🌍 ⸻ 🛰️ Project Kuiper long-duration optionality Satellite broadband isn’t about next quarter revenue — it’s about: 🌐 Global connectivity 🌐 AWS edge expansion 🌐 Infrastructure dominance over decades This is classic Amazon thinking ⏳ ⸻ 🏥 Health + data land grab Early innings, but very on-brand: 🩺 One Medical 💊 Amazon Pharmacy 🤖 Care AI Data + logistics + consumer trust = a powerful long-term health platform ⸻ 🔑 The big picture Amazon has fused: Compute + Commerce + Logistics + Data into one massive operating system for the global economy That’s not a company. That’s infrastructure 💡 ⸻ ✨ Follow Hamza Ishaq for more informative and colorful insights on AI, Cloud Computing, Big Tech, and Future Technology ♻️ Repost to help investors and builders understand how real digital empires are quietly engineered behind the scenes 🧠📈 #Amazon #AMZN #AWS #ArtificialIntelligence #CloudComputing #BigTech #DigitalEconomy #FutureOfBusiness #TechInfrastructure #InvestingInsights
1
-
Tomohiro Furuya
1K followers
World being super charged by AI, exponential growth of log/metrics must be processed and analyzed in an entirely new concept. Ingession based license model, Kafka based platform, is not a sustainable solution in the next era of exponentially growing log/metrics environment from cost and CPu resource perspective. FFWD brings a game changing solution holistically to the ever growing complex environment from network layer all the way up to AI Agent based on their in-house developed AI Native Pipeline framework, Anomally Detecion, and Correlation. When operating a large complex Enteprise environment, Data Center, AIDC, Service Infrastructure, SOC, etc., facing Mass Log/Metrics which is creating ever increasing license fees, servers, and etc. Check out how FFWD may solve your problem! https://www.core0.io/
42
1 Comment -
LynxOps.AI
291 followers
Shadow AI is lurking in the depths of your software development process 🕵️♂️ JFrog's new Shadow AI Detection feature is ready to expose it. This sly tool uncovers hidden and unauthorized AI models and API calls sneaking their way into your enterprise. It might sound like sci-fi, but it's happening now—AI's unchecked spread introduces potential security threats and compliance nightmares. 🟠 Why risk your company's integrity for an AI craze? 🟠 Is your team using AI models and APIs without proper scrutiny? 🟠 Can you afford to ignore the shadowy side of AI integration? Organizations need to wake up to the risks of clandestine AI growth. It's about transparency, control, and safety in technology deployment. Are you prepared to shed light on Shadow AI and protect your enterprise from unseen dangers? #AI #SecurityAwareness #SoftwareDevelopment #JFrog #DigitalRisk 🕵️♀️ 🔗https://lnkd.in/dDMdSqUX 👉 Post of the day: https://lnkd.in/dACBEQnZ 👈
-
Vijayakumar A
NeevCloud® • 8K followers
Jensen Huang dropped a number at #GTC 2026 that stopped me mid-scroll. "I believe computing demand has increased by 1 million times over the last two years." Jensen Huang, GTC 2026 Not over a decade. Not someday. Two years. Then came the slide that reframed everything for me. The entire AI infrastructure thesis in one frame: ▸ AI Factories are the Industrial Infrastructure of the AI Era ▸ Inference is the Workload ▸ Tokens are the New Commodity ▸ Compute is Revenue And the line Jensen said out loud: "It's now a factory to generate tokens." on the All-In podcast at GTC, Jensen laid out a thought experiment. If a $500K/year developer spent less than $250K on tokens by year end, he'd be "deeply alarmed." My read on that: it's not a pricing benchmark. It's a provocation. It's Jensen saying we are still dramatically underestimating how much value a token can unlock. Here's where I land on all of this. The cost of generating tokens is dropping fast. The value of what those tokens can do is rising exponentially. The gap between those two curves? That's where the real opportunity lives. But and this is the part most people skip the gap only gets captured if the infrastructure beneath it is built right. #Tokens don't generate themselves. #Inference doesn't optimize itself. And enterprises won't run mission-critical agentic workloads on infrastructure that can't guarantee latency, throughput and cost-per-token at scale. Whoever builds that layer , reliably, efficiently and at enterprise grade #owns the #AIstack. The inference delivery platform. At NeevCloud®, this is exactly what we're building toward on our AI SuperCloud platform. Because Jensen is right the AI factory era is here. The question is who builds the pipes that make the factory run. What's your take is token cost the most underloved problem in enterprise AI right now? Drop your thoughts below 👇 Note: I didn't attend GTC in person. Insights are from the official NVIDIA GTC keynote on YouTube and public coverage. Screenshot credit: NVIDIA GTC 2026 official livestream. #NvidiaGTC #AIInference #TokenEconomy #AgenticAI #EnterpriseAI #AISuperCloud #NeevCloud
34
-
Inhypes
1K followers
In 2025, one of the most underrated skills is learning JSON prompting. It changes how LLMs respond by giving them clear, structured instructions that read almost like machine code. Instead of writing broad, open-ended prompts, you break your request into simple key–value pairs—things like task, audience, or tone. Models like ChatGPT, Gemini, and Claude can then deliver sharper, more relevant results. This works because LLMs are trained on structured data—code, docs, APIs—so JSON feels natural to them. The result is output that’s cleaner and better aligned with what you want. Credits: heysehajsingh on X Follow @airesearches to stay updated on the world's most fascinating AI developments. ---------------------------------------------------- Learn AI in 3 Minutes a Day 👉 https://lnkd.in/eTNnQjaz ----------------------------------------------------
1
1 Comment
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content