Duo Ding

Menlo Park, California, United States
4K followers 500+ connections

View mutual connections with Duo

Duo can introduce you to 10+ people at Cresta

Email or phone

Password

Forgot password?

or

New to LinkedIn? Join now

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Join to view profile

Cresta

Carnegie Mellon University

About

Managing Research/Engineering teams at AI Startup:
LLM training: post-training [MoE…

Experience

Cresta

Sunnyvale, California, United States
-

San Francisco Bay Area
-

Sunnyvale, California, United States
-

Cupertino, California, United States
-

Cupertino, CA
-
-
-

Boston, Massachusetts, United States
-
-

Shanghai Jiao Tong University

Education

Carnegie Mellon University

-

Activities and Societies: Co-Director, CMU Summit New Venture Competition, April 2012. President of the Chinese Student and Scholar Association (CSSA) in Carnegie Mellon University, 2012-2013. Organizing Committee, 2012 LTI Student Research Symposium, School of Computer Science, Carnegie Mellon University, August 2012.
-

2007 - 2011

Publications

Beyond Audio and Video Retrieval: Topic Oriented Multimedia Summarization

In Proc. of the International Journal of Multimedia Information Retrieval, 2012. Dec 2012
Consumer-grade video is becoming abundant on the Internet, and it is now easier than ever to download multimedia material of any kind and quality. With cell- phones now featuring video recording capability along with broadband connectivity, multimedia material can be recorded and distributed across the world just as easily as text could just a couple of years ago. The easy availability of vast amounts of text gave a huge boost to the Natural Language Processing (NLP) re- search community, which…

Consumer-grade video is becoming abundant on the Internet, and it is now easier than ever to download multimedia material of any kind and quality. With cell- phones now featuring video recording capability along with broadband connectivity, multimedia material can be recorded and distributed across the world just as easily as text could just a couple of years ago. The easy availability of vast amounts of text gave a huge boost to the Natural Language Processing (NLP) re- search community, which was critical in order to orga- nize the amount of information that was suddenly available. The above-mentioned multimedia material is set to do the same for multi-modal audio and video analysis and generation, and in this paper we will argue that natural language can play a big role in organizing this information. We see this as a first step towards systems that will be able to discriminate visually similar, but semanti- cally different videos, compare two videos and provide textual output or summarize a large number of videos at once. In this paper, we introduce our approach of solving the TOMS problem. We extract various visual concept features, environmental sounds and ASR tran- scription features from a given video, and develop a template-based natural language generation system to produce a textual recounting based on the extracted features. We also propose possible experimental designs for continuously evaluating and improving TOMS sys- tems, and present results of a pilot evaluation of our initial system.

Other authors
Informedia E-Lamp@TRECVID 2012 Multimedia Event Detection and Recounting (MED and MER)

In Proceeding of the 2012 National Institute of Standards and Technology (NIST) TREC Video Retrieval Evaluation Workshop, Gaithersburg, MD, USA. Nov 2012
We report on our system used in the TRECVID 2012 Multimedia Event Detection (MED) and Multimedia Event Recounting (MER) tasks. For MED, generally, it consists of three main steps: extracting features, training detectors and fusion. In the feature extraction part, we extract many low-level, high-level features and text features. Those features are then represented in three different ways which are spatial bag-of words with standard tiling, spatial bag-of-words with feature and event specific…

We report on our system used in the TRECVID 2012 Multimedia Event Detection (MED) and Multimedia Event Recounting (MER) tasks. For MED, generally, it consists of three main steps: extracting features, training detectors and fusion. In the feature extraction part, we extract many low-level, high-level features and text features. Those features are then represented in three different ways which are spatial bag-of words with standard tiling, spatial bag-of-words with feature and event specific tiling and the Gaussian Mixture Model Super Vector. In the detector training and fusion, two classifiers and three fusion methods are employed. The results from both of the official sources and our internal evaluations show good performance of our system. For our MER system, it takes some of the features and detection results from the MED system from which the recount is then generated.

Other authors
Event-based Video Retrieval Using Audio.

In Proceeding of the 13th Annual Conference of the International Speech Communication Association (INTERSPEECH-2012), Portland, Oregon, USA. Sep 2012
Multimedia Event Detection (MED) is an annual task in the NIST TRECVID evaluation, and requires participants to build indexing and retrieval systems for locating videos in which certain predefined events are shown. Typical systems focus heavily on the use of visual data. Audio data, however, also contains rich information that can be effectively used for video retrieval, and MED could benefit from the attention of researchers in audio analysis. We present several systems for performing MED…

Multimedia Event Detection (MED) is an annual task in the NIST TRECVID evaluation, and requires participants to build indexing and retrieval systems for locating videos in which certain predefined events are shown. Typical systems focus heavily on the use of visual data. Audio data, however, also contains rich information that can be effectively used for video retrieval, and MED could benefit from the attention of researchers in audio analysis. We present several systems for performing MED using only audio data, report the results of each system on the TRECVID MED 2011 development dataset, and compare the strengths and weaknesses of each approach.

Other authors
See publication
Beyond Audio and Video Retrieval: Towards Multimedia Summarization.

In Proceeding of the 2012 ACM International Conference on Multimedia Retrieval (ICMR-2012), Hong Kong. (Best Paper Nomination) June 8, 2012
Given the deluge of multimedia content that is becoming available over the Internet, it is increasingly important to be able to effectively examine and organize these large stores of information in ways that go beyond browsing or collaborative filtering. In this paper we review previous work on audio and video processing, and define the task of Topic-Oriented Multimedia Summarization (TOMS) using natural language generation: given a set of automatically extracted features from a video (such as…

Given the deluge of multimedia content that is becoming available over the Internet, it is increasingly important to be able to effectively examine and organize these large stores of information in ways that go beyond browsing or collaborative filtering. In this paper we review previous work on audio and video processing, and define the task of Topic-Oriented Multimedia Summarization (TOMS) using natural language generation: given a set of automatically extracted features from a video (such as visual concepts and ASR transcripts) a TOMS system will automatically generate a paragraph of natural language (“a recounting”), which summarizes the important information in a video belonging to a certain topic area, and provides explanations for why a video was matched and retrieved. We see this as a first step towards systems that will be able to discriminate visually similar, but semantically different videos, compare two videos and provide textual output or summarize a large number of videos at once. In this paper, we introduce our approach of solving the TOMS problem. We extract visual concept features and ASR transcription features from a given video, and develop a template-based natural language generation system to produce a textual recounting based on the extracted features. We also propose possible experimental designs for continuously evaluating and improving TOMS systems, and present results of a pilot evaluation of our initial system.

Other authors
See publication
Generating Natural Language Summaries for Multimedia.

In Proceeding of the 7th International Natural Language Generation Conference (INLG-2012), Demo Session, Starved Rock, IL, USA. May 30, 2012
In this paper we introduce an automatic system that generates textual summaries of Internet-style video clips by first identifying suitable high-level descriptive features that have been detected in the video (e.g. visual concepts, recognized speech, actions, objects, persons, etc.). Then a natural language generator is constructed using SimpleNLG to compile the high-level features into a textual form. The generated summary contains information from both visual and acoustic sources, intending…

In this paper we introduce an automatic system that generates textual summaries of Internet-style video clips by first identifying suitable high-level descriptive features that have been detected in the video (e.g. visual concepts, recognized speech, actions, objects, persons, etc.). Then a natural language generator is constructed using SimpleNLG to compile the high-level features into a textual form. The generated summary contains information from both visual and acoustic sources, intending to give a general review and summary of the video. To reduce the complexity of the task, we restrict ourselves to work with videos that show a limited number of “events”. In this demo paper, we describe the design of the system and present example outputs generated by the video summarization system.

Other authors
See publication
Integrate Multilingual Web Search Results using Cross-Lingual Topic Models

In Proceeding of the 5th International Joint Conference on Natural Language Processing (IJCNLP-2011), Workshop: Cross Lingual Information Access. Chiang Mai, Thailand. November 9, 2011

With the thriving of the Internet, web users today have access to resources around the world in more than 200 different languages. How to effectively manage multilingual web search results has emerged as an essential problem. In this paper, we introduce the ongoing work of leveraging a Cross-Lingual Topic Model (CLTM) to integrate the multilingual search results. The CLTM detects the underlying topics of different language results and uses the topic distribution of each result to cluster them…

With the thriving of the Internet, web users today have access to resources around the world in more than 200 different languages. How to effectively manage multilingual web search results has emerged as an essential problem. In this paper, we introduce the ongoing work of leveraging a Cross-Lingual Topic Model (CLTM) to integrate the multilingual search results. The CLTM detects the underlying topics of different language results and uses the topic distribution of each result to cluster them into topic-based classes. In CLTM, we unify distributions in topic level by direct translation, thus distinguishing from other multilingual topic models, which mainly concern the parallelism at document or sentence level (Mimno 2009; Ni, 2009). Experimental results suggest that our CLTM clustering method is effective and outperforms the 6 compared clustering approaches.
Tulsa: Web Search for Writing Assistance

The 34th Annual International ACM SIGIR Conference June 5, 2011
Searching the web while authoring has become a common behavior for many users. Some search the web to research content, while others, especially those writing in a foreign language, search to learn if their usage is appropriate. Can we unify the experiences of search and writing to make authoring more productive? That’s the central question of project Tulsa, which puts the web at writers’ fingertips in a novel writing assistance experience based on implicit web search and natural language…

Searching the web while authoring has become a common behavior for many users. Some search the web to research content, while others, especially those writing in a foreign language, search to learn if their usage is appropriate. Can we unify the experiences of search and writing to make authoring more productive? That’s the central question of project Tulsa, which puts the web at writers’ fingertips in a novel writing assistance experience based on implicit web search and natural language techniques. It provides assistance at three levels: word, phrase and paragraph. Tulsa offers web-mined, contextual reference information and suggestions for completing or revising words and phrases. Paragraph analysis is also provided which can detect outlier usage of language in larger chunks of text. Tulsa bases its suggestions and rankings on the Web as Corpus (WaC) through search engine queries, combined with a Support Vector Machine (SVM) trained on N-gram language features of a web-scale language model.

Other authors
See publication

Courses

Advanced Algebra

-
Algorithm Analysis and Design

-
Algorithms for Natural Language Processing

-
Applied Machine Learning

-
Artificial Intelligence

-
Compiler Principles

-
Computational Models of Discourse Analysis

-
Computer Network

-
Computer Organization and Architecture

-
Data Structure

-
Digital Logic and Analog Circuit

-
Directed Research 2012

-
Directed Research 2013

-
Graph Theory and Combinatoric

-
Innovation of Science and Technology

-
Language Technologies Institute Colloquium 2012

-
Language Technologies Institute Colloquium 2013

-
Language and Statistics

-
Modern Calculus and Analysis

-
Neural Network Theory and Application

-
Object-Oriented Analysis and Design

-
Operating System

-
Physics

-
Principles of Database System

-
Programming

-
Research Design and Writing

-
Research Seminar in Machine Learning and Policy

-
Scientific and Engineering Computing

-
Self-Paced Lab: Rich Interaction in Virtual World

-
Set Theory and Mathematical Logic

-
Software Engineering for Information Systems I

-
Software Engineering for Information Systems II

-
Speech Recognition and Understanding

-
The Theory of Computability

-
Theory of Western Economics

-

Languages

English

Professional working proficiency
Chinese

Native or bilingual proficiency

View Duo’s full profile

See who you know in common
Get introduced
Contact Duo directly

Join to view full profile

Other similar profiles

Chang Chen

Chang Chen

Menlo Park, CA

Connect
Si Chen

Si Chen

San Francisco Bay Area

Connect
Junbiao Tang

Junbiao Tang

Bellevue, WA

Connect
Lokesh Rajaram

Lokesh Rajaram

San Francisco Bay Area

Connect
Ravi Srinivas Ranganathan

Ravi Srinivas Ranganathan

San Jose, CA

Connect
Bin Jiang

Bin Jiang

San Francisco, CA

Connect
Di Li

Di Li

San Francisco, CA

Connect
Jason Gauci

Jason Gauci

Austin, Texas Metropolitan Area

Connect
Jithu Joijoide

Jithu Joijoide

San Francisco Bay Area

Connect
How Jing

How Jing

Sunnyvale, CA

Connect
Khoi Tran

Khoi Tran

Los Angeles, CA

Connect
Evan Cox

Evan Cox

United States

Connect
Nikola Otasevic

Nikola Otasevic

United Kingdom

Connect
Angela Zhang

Angela Zhang

San Francisco Bay Area

Connect
Bo Zhao

Bo Zhao

San Francisco Bay Area

Connect
Sha Hua

Sha Hua

San Francisco Bay Area

Connect
Moriko Handford

Moriko Handford

Upton, MA

Connect
Lei Zhang

Lei Zhang

San Francisco Bay Area

Connect
Gokhan Akdugan

Gokhan Akdugan

New York, NY

Connect
Santosh Mahendra

Santosh Mahendra

San Francisco Bay Area

Connect

Explore more posts

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

Add new skills with these courses

See all courses

Duo Ding

Menlo Park, California, United States 4K followers 500+ connections

About

Experience

Cresta

-

-

-

-

-

-

-

-

-

Education

Carnegie Mellon University

-

-

Publications

Beyond Audio and Video Retrieval: Topic Oriented Multimedia Summarization

In Proc. of the International Journal of Multimedia Information Retrieval, 2012. Dec 2012

Informedia E-Lamp@TRECVID 2012 Multimedia Event Detection and Recounting (MED and MER)

In Proceeding of the 2012 National Institute of Standards and Technology (NIST) TREC Video Retrieval Evaluation Workshop, Gaithersburg, MD, USA. Nov 2012

Event-based Video Retrieval Using Audio.

In Proceeding of the 13th Annual Conference of the International Speech Communication Association (INTERSPEECH-2012), Portland, Oregon, USA. Sep 2012

Beyond Audio and Video Retrieval: Towards Multimedia Summarization.

In Proceeding of the 2012 ACM International Conference on Multimedia Retrieval (ICMR-2012), Hong Kong. (Best Paper Nomination) June 8, 2012

Generating Natural Language Summaries for Multimedia.

In Proceeding of the 7th International Natural Language Generation Conference (INLG-2012), Demo Session, Starved Rock, IL, USA. May 30, 2012

Integrate Multilingual Web Search Results using Cross-Lingual Topic Models

In Proceeding of the 5th International Joint Conference on Natural Language Processing (IJCNLP-2011), Workshop: Cross Lingual Information Access. Chiang Mai, Thailand. November 9, 2011

Tulsa: Web Search for Writing Assistance

The 34th Annual International ACM SIGIR Conference June 5, 2011

Courses

Advanced Algebra

-

Algorithm Analysis and Design

-

Algorithms for Natural Language Processing

-

Applied Machine Learning

-

Artificial Intelligence

-

Compiler Principles

-

Computational Models of Discourse Analysis

-

Computer Network

-

Computer Organization and Architecture

-

Data Structure

-

Digital Logic and Analog Circuit

-

Directed Research 2012

-

Directed Research 2013

-

Graph Theory and Combinatoric

-

Innovation of Science and Technology

-

Language Technologies Institute Colloquium 2012

-

Language Technologies Institute Colloquium 2013

-

Language and Statistics

-

Modern Calculus and Analysis

-

Neural Network Theory and Application

-

Object-Oriented Analysis and Design

-

Operating System

-

Physics

-

Menlo Park, California, United States
4K followers 500+ connections