Research Papers

RESEARCH - the core of our existence.

RESEARCH PAPERS

Sail Labs actively contributes to many research projects and efforts. Our research results are presented at major scientific conferences. All the papers listed below are available on-line and can be requested using our Research Paper Request Form.


Title: Next Generation Data Fusion Open Source Intelligence (OSINT) System Based on MPEG7
Venue:
2008 IEEE International Conference on Technologies for Homeland Security, May 12 - 13, 2008, Boston, U.S.A
Abstract: We describe the Sail Labs Media Mining System which is capable of processing vast amounts of data typically gathered from open sources in unstructured form. The data are processed by a set of components and the output is produced in MPEG7 format. The origin and kind of input may be as diverse as a set of satellite receivers monitoring TV stations or textual input from web-pages or RSS-feeds. A sequence of processing steps analyzing the audio, video and textual content of the input is carried out. The resulting output is made available for search and retrieval, analysis and visualization on a next generation Media Mining Server. Access to the system is web-based; the system can serve as a search platform across open, closed or secured networks. Data may also be extracted and exported and thus be made available in airgap networks. The Media Mining System can be used as a tool for situational awareness, information sharing and risk assessment.

Title: Development of a Modern Greek Broadcast-News Corpus and Speech Recognition System
Venue:
16th Nordic Conference of Computational Linguistics, 25-26 May 2007, Tartu, Estonia
Abstract: We report on the creation of a Modern Greek broadcast-news corpus as a pre-requisite to build a large-vocabulary continuous-speech recognition system. We discuss lexical modeling with respect to pronunciation generation and examine the effects of the lexicon size on word accuracies. Peculiarities of Modern Greek as a highly inflectional language and their challenges for speech recognition are discussed.

Title: Archiving Meets Automatic Speech Recognition Curse Or Blessing ?
Venue:
24. Tonmeistertagung - VDT International Convention, Leipzig, Germany, November 2006
Abstract: This paper examines general concepts behind automatic speech- and language processing technologies set against the requirements of audio archives. It is argued that current technologies in automatic speech recognition, text-analysis and speaker-technologies may be a good starting point to index speech from digitized archive audio material to create low-level descriptors for basic text mining. Together with semantic annotations created the traditional way, the additional information may be the key to an extended archival mining approach.

Title: A Maximum Entropy Semantic Parser Using Word Classes
Venue: 7th International Conference on Spoken Language Processing, ICSLP 2002, Denver, Colorado
Abstract: This paper describes the parser that is used in the Sail Labs Conversational System, which is a spoken dialog system. This parser is a fully statistical, semantic parser. The probability model of the parser is based on the principle of maximum entropy. The maximum entropy framework allows to combine the available information in a fully automatic way, but the training of maximum entropy models is time consuming. Since the parser needs to be retrained when its vocabulary changes, a straightforward application of this model cannot realistically be used in a dialog system. To solve this problem, words can be combined to classes, and the classes can be used instead of the words for the training of the parser. At runtime, words can be added to the classes at no cost.

Title: German Broadcast News Transcription
Venue: 7th International Conference on Spoken Language Processing, ICSLP 2002, Denver, Colorado
Abstract: We describe a newly created broadcast news (BN) corpus based on programs of seven different German and Austrian TV stations and the development of a German BN transcription system based on this corpus. We report on a series of experiments addressing the fact that German is less suited than English for word-based trigram language models. Furthermore, we investigate various phoneme sets and examine the difference between a transregional standard (Bavarian dialect spoken in southern Germany and Austria) and standard German (Hochdeutsch) on the word error rate.

Title: Fitting German into N-Gram Language Models
Venue: Fifth International Conference on TEXT, SPEECH and DIALOGUE, TSD 2002, Brno, Czech Republic
Abstract: We report on a series of experiments addressing the fact that German is less suited than English for word-based n-gram language models. Several systems were trained at different vocabulary sizes and various sets of lexical units. They were evaluated against a newly created corpus of German and Austrian broadcast news.

Title: Automatic Language Identification in Broadcast News
Venue: World Congress on Computational Intelligence, IJCNN 2002, Honolulu, USA, May 2002
Abstract: We present experiments on automatic language identification in the broadcast news domain. Because of the inherent diversity of news broadcasts, speech is extracted from the raw audio data by means of phone-level decoding using broad classes of phonemes. Training and testing was performed on recordings of German, English, Spanish, and French news shows from a variety of European TV channels. Each language is characterized by a Gaussian mixture model solely created from corresponding acoustic features. The overall average error rate on speech segments is 16.32%. The current system disregards (almost) any kind of linguistic information; however it is therefore easily extensible to new languages.

Title: Creating a European English Broadcast News Transcription Corpus and System

Venue: 7th European Conference on Speech Communication and Technology, EuroSpeech 2001, Aalborg, Denmark
Abstract: This paper describes the SAIL LABS Media Mining Indexer, a system for indexing television broadcasts in real-time. We discuss the development of a European English broadcast news corpus according to DARPA Hub-4 conventions. This corpus is suitable for measuring performance of system components, such as speaker identification and speech recognition. We further report evaluation results on our multi-purpose test set, and outline the integration of real-time indexing into a spoken document retrieval system.

Title: Multimedia Archiving with Real-time Speech and Language Technologies

Venue: International Conference on Information, Communications & Signal Processing, ICICS 2001, Singapore
Abstract: In this work we present an approach to augment the functionality of Content.StaR, a multimedia content-management system by Siemens, with state-of-the-art speech and text-processing technologies as provided by the SAIL LABS Media Mining Indexer. The content logging process is automated and done in real-time, thereby improving the efficiency of multimedia archiving considerably.


Diese Seite druckenPrint this Page
Zum Seitenanfang


SITE INFOCONTACTSITEMAP