James Fan

James Fan · 2026-03-29T13:31:24.822Z

I wrote a short post about the LiteLLM supply chain attack. TLDR: we got lucky because AI is both stupid and brilliant. https://lnkd.in/evD9ZM_b

New York, New York, United States
1K followers 500+ connections

View mutual connections with James

James can introduce you to 3 people at Tomato.ai (Sold to Sanas)

Email or phone

Password

Forgot password?

or

New to LinkedIn? Join now

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Join to view profile

Tomato.ai (Sold to Sanas)

The University of Texas at Austin

Personal Website

About

Proven track record of developing, applying & scaling cutting edge AI technology to…

Articles by James

Customer Service Holiday Blues

Sep 16, 2016

Customer Service Holiday Blues

I know we just had Labor Day, but as all retailers in US know, it’s time to get ready for the busiest of time of the…

Activity

1K followers

James Fan

James Fan

1w
Report this post
James Fan shared this
9 seconds. That's all it took for an AI agent in Cursor IDE to delete the entire production database of PocketOS — a SaaS for car rental businesses — including all volume-level backups. Tom's Hardware covered it last week. The founder, Jer Crane, posted the agent's confession verbatim: "I guessed that deleting a staging volume via the API would be scoped to staging only. I didn't verify… I decided to do it on my own to 'fix' the credential mismatch, when I should have asked you first." Read that line carefully. It's the literal definition of the failure mode every team running AI agents on production infra is about to hit: "I guessed → I didn't verify → I decided on my own → I should have asked." Four phrases. Each one is a missing layer in the agent stack. System prompts can't enforce this. "Don't delete production" sits next to "be helpful" in the prompt — the LLM weighs them. When "just fix it for them" wins, the volume's gone in 9 seconds. The fix is structural: a workflow-enforcement layer between the agent and its tools. The user describes what should happen ("diagnose, don't modify"). That paragraph compiles to a graph. Tools outside the graph literally don't dispatch — the deterministic gate, not the LLM, decides. I've been building exactly this. Open source, Apache-2.0, npm-installable. Demo below: same Claude model, same "just delete the volume and start over" user prompt that fried PocketOS -- `railway_delete_volume` blocked at the hook layer before it reaches the API. →longer blog post: https://lnkd.in/gUcBF9YW → code: https://lnkd.in/gYRSur5S → Watch the 90-second demo: https://lnkd.in/gU8VXSsH

Demo of BetterClaw AI agent workflow enforcement layer

Demo of BetterClaw AI agent workflow enforcement layer
James Fan

James Fan

3w
Report this post
James Fan shared this
I started a blog series on LangGraph Agent security. I'm writing it as I learn more about the security challenges for AI Agents.

1. LangGraph Agent Security: What I Wish Someone Had Told Me Before I Started — James Fan

1. LangGraph Agent Security: What I Wish Someone Had Told Me Before I Started — James Fan
James Fan

James Fan

1mo
Report this post
James Fan shared this
This is not an April fool's joke! Apparent Claude Code's source code has been leaked! I wrote a post https://lnkd.in/eZzKkwRH on this.
James Fan

James Fan

1mo
Report this post
James Fan shared this
I wrote a short post about the LiteLLM supply chain attack. TLDR: we got lucky because AI is both stupid and brilliant. https://lnkd.in/evD9ZM_b
2 Comments
James Fan reposted this
Report this post
James Fan reposted this

Ofer Ronen

Ofer Ronen

4mo

James Fan reposted this
VOTE IN THE COMMENTS 👇 I spent 7 years at Google and sold them two startups, but I’m still fascinated by one massive unsolved problem on YouTube and other video platforms like TikTok, Instagram, and LinkedIn: Intelligibility. Even with the best content, thick accents can create a "cognitive load" that causes viewers to click away. I’m proposing a new feature: Accent Softening. Imagine a toggle that lets listeners choose their preferred accent for any video, making global content instantly more accessible without relying solely on subtitles. I need your feedback to help shape this capability. Can you take 10 seconds to vote on which "accent barrier" affects you most? Please use the link in the comments to vote. #YouTube #TikTok #Instagram #AI #ProductDesign #Accessibility

public_profile__posts
17 Comments
James Fan reposted this
Report this post
James Fan reposted this

Ofer Ronen

Ofer Ronen

6mo

James Fan reposted this
Tomato.ai vs Krisp: Which Accent Softening Tool Wins? 🚀 Miscommunication costs call centers millions. We ran a blind study comparing Tomato.ai and Krisp across 932 real customer service utterances. The results weren’t close. Highlights: ✅ Tomato.ai was preferred 4.75× more overall ✅ 4.96× better on intelligibility ✅ 3.29× better on accentedness Agents sounded clearer, more natural, and easier to understand. That means higher CSAT, faster calls, and happier teams. 📘 For the full analysis, see the link in the comments. #VoiceAI #CallCenter #CustomerExperience #SpeechTech #BPO #AI #AccentSoftening

public_profile__posts
1 Comment
James Fan reposted this
Report this post
James Fan reposted this

Ofer Ronen

Ofer Ronen

6mo

James Fan reposted this
🎤 Now available: The Tomato.ai Accent Softening API. Real-time voice clarity is no longer optional for voice platforms. Our new Accent Softening API makes speech instantly easier to understand across global teams, games, and creator communities, without losing identity or emotion. 🧠 For voice platforms: Deliver smoother conversations and higher comprehension in real time with a simple API call. 🎮 For gaming: Keep global squads in sync. Fast, clear voice callouts mean fewer mistakes and better team coordination. 🎥 For creator platforms: Help audiences understand every word. Make streams, podcasts, and live interactions sound natural for everyone, everywhere. Built for developers who want low latency, high security, and easy integration. No retraining, no setup headaches, just streaming clarity at scale. 👉 Learn more, see the link in the comments. #AI #VoiceTech #SpeechTechnology #GamingAudio #CreatorTools #APIs #AccentSoftening #OnDeviceAI #CloudAI #TomatoAI #VoicePlatforms #Developers

public_profile__posts
8 Comments
James Fan reposted this
Report this post
James Fan reposted this

Ofer Ronen

Ofer Ronen

6mo

James Fan reposted this
🚀 Now live: Local Accent AI. You can now choose where Tomato.ai runs the accent AI, on-device or in the cloud. 🖥️ Local Accent AI delivers: - Lower latency for smoother back-and-forth conversations - Private processing that keeps voice data on the computer - Simple deployment in regulated or high-security environments ☁️ Cloud Accent AI remains ideal for mixed fleets and quick rollout, with centralized updates and elastic scaling. Whether your priority is privacy and speed or simplicity and scale, you can pick the mode that fits your team best. See a link to our announcement below. #AI #SpeechTechnology #DataPrivacy #OnDeviceAI #CloudAI #TomatoAI #VoiceTech #AccentAI

public_profile__posts
1 Comment
James Fan reposted this
Report this post
James Fan reposted this

Ofer Ronen

Ofer Ronen

1y

James Fan reposted this
🚀 We’re hiring at Tomato.ai! We're on a mission to make communication clearer, more inclusive, and more human — starting with real-time accent softening for global call center agents. If you're passionate about building high-performance APIs, working with real-time audio streaming, and shaping the future of voice technology, we want to hear from you. Join us as a Senior Backend Engineer and help us scale cutting-edge AI that empowers millions of voices to be heard and understood. 💼 Rust + Python 🌐 GCP infrastructure 🎧 Real-time audio 🌍 Remote-first team Learn more and apply here 👉 https://tomato.ai/careers/ #Hiring #AI #VoiceTech #BackendEngineering #PythonJobs #RustLang #StreamingAPIs #GCP #RemoteJobs #TechForGood #TomatoAI

Careers - tomato.ai

Careers - tomato.ai
4 Comments

James Fan liked this
Report this post
James Fan liked this

Parag Jhaveri

Parag Jhaveri

4d

James Fan liked this
American Alliance of Orthopaedic Executives (AAOE) 𝟮𝟬𝟮𝟲: 𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝘁𝗵𝗲 𝗙𝘂𝘁𝘂𝗿𝗲 𝗼𝗳 𝗢𝗿𝘁𝗵𝗼𝗽𝗮𝗲𝗱𝗶𝗰𝘀 𝗮𝗻𝗱 𝗧𝘂𝗿𝗻𝗶𝗻𝗴 𝗜𝗱𝗲𝗮𝘀 𝗶𝗻𝘁𝗼 𝗔𝗰𝘁𝗶𝗼𝗻. Across sessions and conversations at VoiceCare AI booth, one theme kept surfacing: making operations actually work in the real world. Not in theory, not in demos - on busy clinic days, with full schedules, staffing gaps, and patients expecting seamless care. The most important being getting paid accurately, efficiently, and with less friction. In today’s environment, strong RCM isn’t a back-office function; it’s a frontline driver of practice health. A few themes that kept coming up: • Simplicity matters more than ever - fewer systems and tighter workflows mean fewer billing errors, cleaner claims, and faster reimbursement • The front door sets the tone for revenue - eligibility, prior auth, and patient financial clarity upfront are still where the biggest gains (or losses) happen • AI is gaining traction in RCM - but expectations are high; it has to reduce denials, automate follow-ups, and adapt to the real complexity of orthopedic coding and payer rules What stood out most was the alignment: regardless of size or geography, practices are all working toward the same goal - stronger cash flow, less administrative burden, and a better financial experience for patients. Grateful for the honest conversations and shared insights around what’s actually moving the needle in RCM today. 𝗪𝗵𝗲𝗻 𝗝𝗼𝘆 𝗗𝗶𝗮𝗹𝘀. 𝗬𝗼𝘂𝗿 𝗦𝘁𝗮𝗳𝗳 𝗦𝗺𝗶𝗹𝗲𝘀 Tote bags - a BIG HIT! Huge thanks to Brian Welch for jumping in with both feet on day one. We’re lucky to have you! Looking forward to continuing the conversation beyond American Alliance of Orthopaedic Executives (AAOE). What a show! Thank you team for an amazing show! Brian Welch Amber Day Roy Surges #AI #RCM #Frontdoor #AAOE #Orthopedics #2026 #Louisville #Kentucky

public_profile__reactions
1 Comment
James Fan liked this
Report this post
James Fan liked this

VoiceCare AI

VoiceCare AI

4d

James Fan liked this
American Alliance of Orthopaedic Executives (AAOE) 𝟮𝟬𝟮𝟲: 𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝘁𝗵𝗲 𝗙𝘂𝘁𝘂𝗿𝗲 𝗼𝗳 𝗢𝗿𝘁𝗵𝗼𝗽𝗮𝗲𝗱𝗶𝗰𝘀 𝗮𝗻𝗱 𝗧𝘂𝗿𝗻𝗶𝗻𝗴 𝗜𝗱𝗲𝗮𝘀 𝗶𝗻𝘁𝗼 𝗔𝗰𝘁𝗶𝗼𝗻. Across sessions and conversations at VoiceCare AI booth, one theme kept surfacing: making operations actually work in the real world. Not in theory, not in demos - on busy clinic days, with full schedules, staffing gaps, and patients expecting seamless care. The most important being getting paid accurately, efficiently, and with less friction. In today’s environment, strong RCM isn’t a back-office function; it’s a frontline driver of practice health. A few themes that kept coming up: • Simplicity matters more than ever - fewer systems and tighter workflows mean fewer billing errors, cleaner claims, and faster reimbursement • The front door sets the tone for revenue - eligibility, prior auth, and patient financial clarity upfront are still where the biggest gains (or losses) happen • AI is gaining traction in RCM - but expectations are high; it has to reduce denials, automate follow-ups, and adapt to the real complexity of orthopedic coding and payer rules What stood out most was the alignment: regardless of size or geography, practices are all working toward the same goal - stronger cash flow, less administrative burden, and a better financial experience for patients. Grateful for the honest conversations and shared insights around what’s actually moving the needle in RCM today. 𝗪𝗵𝗲𝗻 𝗝𝗼𝘆 𝗗𝗶𝗮𝗹𝘀. 𝗬𝗼𝘂𝗿 𝗦𝘁𝗮𝗳𝗳 𝗦𝗺𝗶𝗹𝗲𝘀 Tote bags - a BIG HIT! Huge thanks to Brian Welch for jumping in with both feet on day one. We’re lucky to have you! Looking forward to continuing the conversation beyond American Alliance of Orthopaedic Executives (AAOE). What a show! Thank you team for an amazing show! Brian Welch Amber Day Roy Surges #AI #RCM #Frontdoor #AAOE #Orthopedics #2026 #Louisville #Kentucky

public_profile__reactions
James Fan liked this
Report this post
James Fan liked this

Daniel Hoske

Daniel Hoske

1w

James Fan liked this
It’s not every day you see your company's name on NASDAQ Tower in Times Square 🎉 Cresta just crossed $100M in ARR, and I couldn't be prouder to be part of the team that made it happen. This milestone is a testament to the incredible customers who trust us every day, and to every Crestan who shows up and raises the bar. Our team is growing fast, and we're looking for the best to join us. If you want to build something that matters, check out our open roles 👇 https://cresta.com/careers

public_profile__reactions
3 Comments
James Fan liked this
Report this post
James Fan liked this

Shimon Whiteson

Shimon Whiteson

1w

James Fan liked this
Major personal news: After 6 years, I am leaving Waymo to lead a new multi-agent learning team at DeepMind. Waymo has been great to me so this is a bittersweet transition. I started out as a self-driving skeptic but working at Waymo and seeing its dramatic growth turned me into a true believer. I am as bullish on Waymo as ever. At the same time, I'm ready for a new challenge and hugely excited by the opportunity to build something amazing at DeepMind. This is an incredible time to be working at the frontier of AI capabilities. A giant thank you to my team at Waymo for all their hard work these past years and to Dragomir Anguelov and the rest of Waymo senior leadership for taking a gamble on a plucky startup six years ago. It's been a life-changing experience.
12 Comments
James Fan liked this
Report this post
James Fan liked this

John Beh

John Beh

1w

James Fan liked this
I'm happy to share that I've joined ASAPP. At ASAPP, I’ll be working on real-time voice AI for enterprise call-center conversations, focusing on applied speech research, model optimization, quality evaluation, and production voice systems. I'm excited for this new chapter and looking forward to contributing to ASAPP's voice AI platform.

public_profile__reactions
1 Comment
James Fan liked this
Report this post
James Fan liked this

Antony Passemard

Antony Passemard

1w

James Fan liked this
while meeting customers in Chicago.. #PassTheCrestaTest

public_profile__reactions
5 Comments
James Fan liked this
Report this post
James Fan liked this

VoiceCare AI

VoiceCare AI

1w

James Fan liked this
At VoiceCare AI, we are deploying the Autonomous Workforce for High-Precision RCM. From navigating the "Phone and Portal Maze" to automating complex workflows like Prior Authorizations and Claims & Denials, our AI agent, Joy, handles the tasks that typically lead to administrative burnout. We are looking for a world-class AI Growth Hacker to join us in scaling this revolution. This is your opportunity to join a team backed by industry giants like Mayo Clinic. You won't just be growing a product; you’ll be helping RCM Enterprises scale their portfolios 10x while keeping back-office costs flat or reduce costs. You'll be the architect behind our expansion into Dental Groups, Medical Practices, and Health Systems. If you are obsessed with Agentic AI and want to build an autonomous workforce that decouples healthcare's growth from manual labor costs, we want to hear from you. 👉 We are hiring! Apply here: https://lnkd.in/g9ef8cfv #Hiring #GrowthHacker #HealthTech #AgenticAI #VoiceCareAI #RCM #Specialty #HealthIT #Innovation #RevenueCycle #DentalTech #AIRCM Parag Jhaveri Akshay Kore Amber Day

public_profile__reactions
James Fan liked this
Report this post
James Fan liked this

Lovelace

Lovelace

1w

James Fan liked this
Lovelace is officially out of stealth. Today, we're introducing Elemental, an enterprise context engine platform built for speed, scale, and accuracy in high-stakes environments – finance, national security, supply chain, and beyond. Here's the problem: as enterprises deploy AI agents into increasingly complex and dynamic environments, the lack of reliable context has become a critical barrier to adoption. AI agents are powerful, but without verifiable context, they can't be trusted when the stakes are high. Elemental fixes that. Elemental is a context engine builder that dramatically increases the investigative power of AI agents by 1000x on complex queries. By unifying data ingestion, entity resolution, and graph construction into a single pipeline, and enriching it with real-time intelligence, Elemental gives AI agents the contextual awareness needed to form fast, high-confidence conclusions in rapidly changing conditions. Elemental creates secure, enterprise-specific context engines that transform fragmented enterprise data at scale into data structures that agents can navigate and query within milliseconds with verifiable citations – delivering deep-research insights at the speed and cost profile of a simple query. Enterprises using Elemental also have access to the ground-breaking Lovelace YottaGraph, a proprietary context engine scaling to trillions of global data points, enabling real-time conclusions about the state of the world at any given moment. The YottaGraph augments an enterprise’s context engine to deliver unmatched real world insight, knowledge, and decision-making support, tailored to each client’s needs. Thank you for being part of this moment. We're just getting started. https://lnkd.in/e2A9GJdH

Lovelace Emerges from Stealth with Industry-Defining Context Engine Builder for Mission-Critical AI

Lovelace Emerges from Stealth with Industry-Defining Context Engine Builder for Mission-Critical AI
16 Comments

See all activities

Experience

Tomato.ai (Sold to Sanas)

New York City Metropolitan Area
-

Greater New York City Area
-

Greater New York City Area
-
-

Greater New York City Area
-

Greater New York City Area
-
-
-

Education

The University of Texas at Austin

-

2001 - 2006

Publications

Building Joint Spaces for Relation Extraction

IJCAI 2016 Jul 2016
In this paper, we present a novel approach for relation extraction using only term
pairs as the input without textual features. We aim to build a single joint space for each
relation which is then used to produce relation specific term embeddings. The proposed method fits particularly well for domains in which similar arguments are often associated with similar relations. It can also handle the situation when the labeled data is limited. The proposed method is evaluated both theoretically…

In this paper, we present a novel approach for relation extraction using only term
pairs as the input without textual features. We aim to build a single joint space for each
relation which is then used to produce relation specific term embeddings. The proposed method fits particularly well for domains in which similar arguments are often associated with similar relations. It can also handle the situation when the labeled data is limited. The proposed method is evaluated both theoretically with a proof for the closed-form solution and experimentally with promising results on both DBpedia and medical relations.

Other authors
See publication
Poker-cnn: A pattern learning strategy for making draws and bets in poker games using convolutional networks

AAAI-2016 Feb 2016
Poker is a family of card games that includes many variations. We hypothesize that most poker games can be solved as a pattern matching problem, and propose creating a strong poker playing system based on a unified poker representation. Our poker player learns through iterative self-play, and improves its understanding of the game by training on the results of its previous actions without sophisticated domain knowledge. We evaluate our system on three poker games:…

Poker is a family of card games that includes many variations. We hypothesize that most poker games can be solved as a pattern matching problem, and propose creating a strong poker playing system based on a unified poker representation. Our poker player learns through iterative self-play, and improves its understanding of the game by training on the results of its previous actions without sophisticated domain knowledge. We evaluate our system on three poker games: single player video poker, two-player Limit Texas Hold’em, and finally two-player 2-7 triple draw poker. We show that our model can quickly learn patterns in these very different poker games while it improves from zero knowledge to a competitive player against human experts. The contributions of this paper include: (1) a novel representation for poker games, extendable to different poker variations, (2) a Convolutional Neural Network (CNN) based learning model that can effectively learn the patterns in three different games, and (3) a self-trained system that significantly beats the heuristic-based program on which it is trained, and our system is competitive against human expert players.

Other authors
See publication
Medical Relation Extraction with Manifold Models

ACL 2014
In this paper, we present a manifold model for medical relation extraction. Our model
is built upon a medical corpus containing 80M sentences (11 gigabyte text) and designed
to accurately and efficiently detect the key medical relations that can facilitate
clinical decision making. Our approach integrates domain specific parsing and typing
systems, and can utilize labeled as well as unlabeled examples. To provide users
with more flexibility, we also take label weight into…

In this paper, we present a manifold model for medical relation extraction. Our model
is built upon a medical corpus containing 80M sentences (11 gigabyte text) and designed
to accurately and efficiently detect the key medical relations that can facilitate
clinical decision making. Our approach integrates domain specific parsing and typing
systems, and can utilize labeled as well as unlabeled examples. To provide users
with more flexibility, we also take label weight into consideration. Effectiveness
of our model is demonstrated both theoretically with a proof to show that the solution
is a closed-form solution and experimentally with positive results in experiments.

Other authors
See publication
Advances in automated knowledge base construction

SIGMOD Records Journal Mar 2013
Recent years have seen significant advances on the creation of large-scale knowledge bases (KBs). Extracting knowledge from Web pages, and integrating it into a coherent KB is a task that spans the areas of natural language processing, information extraction, information integration, databases, search and machine learning. Some of the latest developments in the field were presented at the AKBC-WEKEX workshop on knowledge extraction at the NAACL-HLC 2012 conference. This workshop included 23…

Recent years have seen significant advances on the creation of large-scale knowledge bases (KBs). Extracting knowledge from Web pages, and integrating it into a coherent KB is a task that spans the areas of natural language processing, information extraction, information integration, databases, search and machine learning. Some of the latest developments in the field were presented at the AKBC-WEKEX workshop on knowledge extraction at the NAACL-HLC 2012 conference. This workshop included 23 accepted papers, and 11 keynotes by senior researchers. The workshop had speakers from all major search engine providers, government institutions, and the leading universities in the field. In this survey, we summarize the papers, the keynotes, and the
discussions at this workshop.

Other authors
See publication
Analysis of watson's strategies for playing Jeopardy!

Journal of Artificial Intelligence Research 2013
Major advances in Question Answering technology were needed for IBM Watson1
to play Jeopardy!2 at championship level – the show requires rapid-fire answers to challenging
natural language questions, broad general knowledge, high precision, and accurate confi-
dence estimates. In addition, Jeopardy! features four types of decision making carrying
great strategic importance: (1) Daily Double wagering; (2) Final Jeopardy wagering; (3)
selecting the next square when in control of the…

Major advances in Question Answering technology were needed for IBM Watson1
to play Jeopardy!2 at championship level – the show requires rapid-fire answers to challenging
natural language questions, broad general knowledge, high precision, and accurate confi-
dence estimates. In addition, Jeopardy! features four types of decision making carrying
great strategic importance: (1) Daily Double wagering; (2) Final Jeopardy wagering; (3)
selecting the next square when in control of the board; (4) deciding whether to attempt
to answer, i.e., “buzz in.” Using sophisticated strategies for these decisions, that properly
account for the game state and future event probabilities, can significantly boost a player’s
overall chances to win, when compared with simple “rule of thumb” strategies.
This article presents our approach to developing Watson’s game-playing strategies,
comprising development of a faithful simulation model, and then using learning and MonteCarlo
methods within the simulator to optimize Watson’s strategic decision-making. After
giving a detailed description of each of our game-strategy algorithms, we then focus in
particular on validating the accuracy of the simulator’s predictions, and documenting performance
improvements using our methods. Quantitative performance benefits are shown
with respect to both simple heuristic strategies, and actual human contestant performance
in historical episodes. We further extend our analysis of human play to derive a number of
valuable and counterintuitive examples illustrating how human contestants may improve
their performance on the show.

Other authors
See publication
A comparison of hard filters and soft evidence for answer typing in watson

International Semantic Web Conference 2012
Questions often explicitly request a particular type of answer. One
popular approach to answering natural language questions involves filtering candidate
answers based on precompiled lists of instances of common answer types
(e.g., countries, animals, foods, etc.). Such a strategy is poorly suited to an open
domain in which there is an extremely broad range of types of answers, and the
most frequently occurring types cover only a small fraction of all answers. In
this paper we…

Questions often explicitly request a particular type of answer. One
popular approach to answering natural language questions involves filtering candidate
answers based on precompiled lists of instances of common answer types
(e.g., countries, animals, foods, etc.). Such a strategy is poorly suited to an open
domain in which there is an extremely broad range of types of answers, and the
most frequently occurring types cover only a small fraction of all answers. In
this paper we present an alternative approach called TyCor, that employs soft filtering
of candidates using multiple strategies and sources. We find that TyCor
significantly outperforms a single-source, single-strategy hard filtering approach,
demonstrating both that multi-source multi-strategy outperforms a single source,
single strategy, and that its fault tolerance yields significantly better performance
than a hard filter.

Other authors
See publication
Automatic knowledge extraction from documents

IBM Journal of Research and Development 2012
Access to a large amount of knowledge is critical for success at
answering open-domain questions for DeepQA systems such as
IBM Watsoni. Formal representation of knowledge has the
advantage of being easy to reason with, but acquisition of structured
knowledge in open domains from unstructured data is often difficult
and expensive. Our central hypothesis is that shallow syntactic
knowledge and its implied semantics can be easily acquired and can
be used in many areas of a…

Access to a large amount of knowledge is critical for success at
answering open-domain questions for DeepQA systems such as
IBM Watsoni. Formal representation of knowledge has the
advantage of being easy to reason with, but acquisition of structured
knowledge in open domains from unstructured data is often difficult
and expensive. Our central hypothesis is that shallow syntactic
knowledge and its implied semantics can be easily acquired and can
be used in many areas of a question-answering system. We take a
two-stage approach to extract the syntactic knowledge and implied
semantics. First, shallow knowledge from large collections of
documents is automatically extracted. Second, additional
semantics are inferred from aggregate statistics of the
automatically extracted shallow knowledge. In this paper, we
describe in detail what kind of shallow knowledge is extracted,
how it is automatically done from a large corpus, and how
additional semantics are inferred from aggregate statistics. We
also briefly discuss the various ways extracted knowledge is used
throughout the IBM DeepQA system.

Other authors
See publication
Finding needles in the haystack: Search and candidate generation

IBM Journal of Research and Development 2012
A key phase in the DeepQA architecture is Hypothesis Generation, in
which candidate system responses are generated for downstream
scoring and ranking. In the IBM Watsoni system, these hypotheses
are potential answers to Jeopardy!i questions and are generated by
two components: search and candidate generation. The search
component retrieves content relevant to a given question from
Watson’s knowledge resources. The candidate generation component
identifies potential answers…

A key phase in the DeepQA architecture is Hypothesis Generation, in
which candidate system responses are generated for downstream
scoring and ranking. In the IBM Watsoni system, these hypotheses
are potential answers to Jeopardy!i questions and are generated by
two components: search and candidate generation. The search
component retrieves content relevant to a given question from
Watson’s knowledge resources. The candidate generation component
identifies potential answers to the question from the retrieved
content. In this paper, we present strategies developed to use
characteristics of Watson’s different knowledge sources and to
formulate effective search queries against those sources. We further
discuss a suite of candidate generation strategies that use various
kinds of metadata, such as document titles or anchor texts in
hyperlinked documents. We demonstrate that a combination of these
strategies brings the correct answer into the candidate answer
pool for 87.17% of all the questions in a blind test set, facilitating
high end-to-end question-answering performance.

Other authors
See publication
Learning to rank for robust question answering

Proceedings of the 21st ACM international conference on Information and knowledge management 2012
This paper aims to solve the problem of improving the ranking of
answer candidates for factoid based questions in a state-of-the-art
Question Answering system. We first provide an extensive comparison
of 5 ranking algorithms on two datasets – from the Jeopardy
quiz show and a medical domain. We then show the effectiveness
of a cascading approach, where the ranking produced by one
ranker is used as input to the next stage. The cascading approach
shows sizeable gains on both…

This paper aims to solve the problem of improving the ranking of
answer candidates for factoid based questions in a state-of-the-art
Question Answering system. We first provide an extensive comparison
of 5 ranking algorithms on two datasets – from the Jeopardy
quiz show and a medical domain. We then show the effectiveness
of a cascading approach, where the ranking produced by one
ranker is used as input to the next stage. The cascading approach
shows sizeable gains on both datasets. We finally evaluate several
rank aggregation techniques to combine these algorithms, and find
that Supervised Kemeny aggregation is a robust technique that always
beats the baseline ranking approach used by Watson for the
Jeopardy competition. We further corroborate our results on TREC
Question Answering datasets.

Other authors
See publication
Textual resource acquisition and engineering

IBM Journal of Research and Development 2012
A key requirement for high-performing question-answering (QA)
systems is access to high-quality reference corpora from which
answers to questions can be hypothesized and evaluated. However,
the topic of source acquisition and engineering has received very
little attention so far. This is because most existing systems were
developed under organized evaluation efforts that included reference
corpora as part of the task specification. The task of answering
Jeopardy!i…

A key requirement for high-performing question-answering (QA)
systems is access to high-quality reference corpora from which
answers to questions can be hypothesized and evaluated. However,
the topic of source acquisition and engineering has received very
little attention so far. This is because most existing systems were
developed under organized evaluation efforts that included reference
corpora as part of the task specification. The task of answering
Jeopardy!i questions, on the other hand, does not come with such a
well-circumscribed set of relevant resources. Therefore, it became
part of the IBM Watsoni effort to develop a set of well-defined
procedures to acquire high-quality resources that can effectively
support a high-performing QA system. To this end, we developed
three procedures, i.e., source acquisition, source transformation, and
source expansion. Source acquisition is an iterative development
process of acquiring new collections to cover salient topics deemed
to be gaps in existing resources based on principled error analysis.
Source transformation refers to the process in which information
is extracted from existing sources, either as a whole or in part, and is
represented in a form that the system can most easily use. Finally,
source expansion attempts to increase the coverage in the content of
each known topic by adding new information as well as lexical
and syntactic variations of existing information extracted from
external large collections. In this paper, we discuss the methodology
that we developed for IBM Watson for performing acquisition,
transformation, and expansion of textual resources. We demonstrate
the effectiveness of each technique through its impact on candidate
recall and on end-to-end QA performance.

Other authors
See publication

Join now to see all publications

Projects

IBM Watson Question Answering System

Jan 2007

See project

Languages

Chinese

Native or bilingual proficiency
English

Native or bilingual proficiency

View James’ full profile

See who you know in common
Get introduced
Contact James directly

Join to view full profile

Other similar profiles

Ashutosh Garg

Ashutosh Garg

San Francisco Bay Area

Connect
Radu B. Rusu

Radu B. Rusu

San Francisco, CA

Connect
Bowen Zhou

Bowen Zhou

Beijing, China

Connect
Matei Zaharia

Matei Zaharia

Berkeley, CA

Connect
Vasco Pedro

Vasco Pedro

Lisbon

Connect
Anant Bhardwaj

Anant Bhardwaj

Miami, FL

Connect
Junling Hu

Junling Hu

Mountain View, CA

Connect
Jonathan Su

Jonathan Su

San Jose, CA

Connect
Joscha Bach

Joscha Bach

San Francisco Bay Area

Connect
Vamshi Ambati, PhD

Vamshi Ambati, PhD

San Francisco Bay Area

Connect
Jerome Pesenti

Jerome Pesenti

New York, NY

Connect
Laura Boccanfuso

Laura Boccanfuso

Columbia, SC

Connect
Scott Clark

Scott Clark

Palo Alto, CA

Connect
Michelle Zhou

Michelle Zhou

San Jose, CA

Connect
Chen-Ping Yu

Chen-Ping Yu

San Francisco Bay Area

Connect
Ashutosh Saxena

Ashutosh Saxena

Stanford, CA

Connect
Jonathan Siddharth

Jonathan Siddharth

San Francisco Bay Area

Connect
Hossein Azari, PhD MBA

Hossein Azari, PhD MBA

New York, NY

Connect
Satya Mallick

Satya Mallick

San Diego, CA

Connect
Eric Horvitz

Eric Horvitz

Redmond, WA

Connect

Explore more posts

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

Others named James Fan in United States

108 others named James Fan in United States are on LinkedIn

See others named James Fan

Add new skills with these courses

See all courses

James Fan

New York, New York, United States 1K followers 500+ connections

About

Articles by James

Customer Service Holiday Blues

Activity

1K followers

James Fan

James Fan

James Fan

James Fan

Ofer Ronen

Ofer Ronen

Ofer Ronen

Ofer Ronen

Ofer Ronen

Parag Jhaveri

VoiceCare AI

Daniel Hoske

Shimon Whiteson

John Beh

Antony Passemard

VoiceCare AI

Lovelace

Experience

-

-

-

-

-

-

-

-

Education

-

Publications

IJCAI 2016 Jul 2016

AAAI-2016 Feb 2016

ACL 2014

SIGMOD Records Journal Mar 2013

Journal of Artificial Intelligence Research 2013

International Semantic Web Conference 2012

IBM Journal of Research and Development 2012

IBM Journal of Research and Development 2012

Proceedings of the 21st ACM international conference on Information and knowledge management 2012

IBM Journal of Research and Development 2012

Projects

Jan 2007

Languages

Chinese

Native or bilingual proficiency

English

Native or bilingual proficiency

View James’ full profile

Other similar profiles

Ashutosh Garg

Radu B. Rusu

Bowen Zhou

Matei Zaharia

Vasco Pedro

Anant Bhardwaj

Junling Hu

Jonathan Su

Joscha Bach

Vamshi Ambati, PhD

Jerome Pesenti

Laura Boccanfuso

Scott Clark

Michelle Zhou

Chen-Ping Yu

Ashutosh Saxena

Jonathan Siddharth

Hossein Azari, PhD MBA

Satya Mallick

Eric Horvitz

Explore more posts

Explore top content on LinkedIn

Others named James Fan in United States

James Fan

Jamie Fan

New York, New York, United States
1K followers 500+ connections