SPECOM 2016 conference proceedings - LNAI 9811 is now available online. You can access the online version at here.

Program at a glance


23 Aug

24 August

25 August

26 August

27 Aug

8.00   Registration      
8.30 Opening ceremony
Keynote lecture of
Ralf Schlueter
Keynote lecture of
Attila Vékony
Keynote lecture of
Nick Campbell
10.00 Coffee break Coffee break Coffee break

Speech recognition
and understanding
Natural language

Lunch Lunch Lunch

Poster session I
Poster session
Poster session II
16.00 Registration Coffee break Coffee break Coffee break

Speech synthesis Interactive
Speaker and
18.30 - 20.00
    Closing ceremony
19.30 Gala dinner
on the Danube
19.30 - 21.30



The detailed program can be downloaded here 

Detailed Technical Program

Tuesday, August, 23th

16:00-18:00 Registration
18:30-20:00 Welcome Reception

Wednesday, August, 24th

08:00-08:30 Registration
08:30-09:00 Opening ceremony

09:00-10:00 Keynote speech: Automatic Speech Recognition based on Neural Networks 
Ralf Schlueter, RWTH Aachen University, Germany
Chair: Géza Németh, Budapest University of Technology and Economics, Hungary

10:00-10:30 Coffee break

10:30-12:30 Speech recognition and understanding 
Chair: Alexey Karpov, SPIIRAS, Russia

10:30-10:50 Adaptation of DNN Acoustic Models using KL-divergence Regularization and Multi-Task Training 
Lászlo Tóth and Gábor Gosztolya
10:50-11:10 Improving Automatic Speech Recognition Containing Additive Noise Using Deep Denoising, Autoencoders of LSTM Networks 
Marvin Coto, John Goddard and Fabiola Martinez
11:10-11:30 Knowledge Transfer for Utterance Classification in Low-Resource Languages 
Andrei Smirnov and Valentin Mendelev
11:30-11:50 Designing Syllable Models for an HMM based Speech Recognition System
Kseniya Proenca, Kris Demuynck and Dirk Van Compernolle
11:50-12:10 In-document Adaptation for a Human Guided Automatic Transcription Service
André Mansikkaniemi, Mikko Kurimo and Krister Lindén
12:10-12:30 Automatic Summarization of Highly Spontaneous Speech 
András Beke and György Szaszák

12:30-14:00 Lunch

14:00-16:00 SPECOM Poster session I
Chair: Ralf Schlueter, RWTH Aachen University, Germany

P1: Exploring GMM-derived Features for Unsupervised Adaptation of Deep Neural Network Acoustic Models
Natalia Tomashenko, Yuri Khokhlov, Anthony Larcher and Yannick Estève
P2: DNN-based Acoustic Modeling for Russian Speech Recognition Using Kaldi
Irina Kipyatkova and Alexey Karpov
P3: Improving the Quality of Automatic Speech Recognition in Trucks
Maхim Korenevsky, Ivan Medennikov and Vadim Shchemelinin
P4: Feature Space VTS with Phase Term Modeling
Maxim Korenevsky and Aleksei Romanenko
P5: LSTM-based Language Models for Spontaneous Speech Recognition
Ivan Medennikov and Anna Bulusheva
P6: Speaker-dependent bottleneck features for Egyptian Arabic speech recognition
Aleksei Romanenko and Valentin Mendelev
P7: Advances in STC Russian Spontaneous Speech Recognition System
Ivan Medennikov and Alexey Prudnikov
P8: Combining Atom Decomposition of the F0 Track and HMM-based Phonological Phrase Modelling for Robust Stress Detection in Speech
György Szaszák, Máté Ákos Tündik, Branislav Gerazov and Aleksandar Gjoreski
P9: Improving Recognition of Dysarthric Speech Using Severity Based Tempo Adaptation Chitralekha Bhat, Bhavik Vachhani and Sunil Kumar Kopparapu
P10: Comparison of Retrieval Approaches and Blind Relevance Feedback Methods within the Czech Speech Information Retrieval
Lucie Skorkovska
P11: A Phonetic Segmentation Procedure Based on Hidden Markov Models
Edvin Pakoci, Branislav Popović, Nikša Jakovljević, Darko Pekar and Fathy Yassa
P12: Stress, arousal, and stress detector trained on acted speech database
Róbert Sabo, Milan Rusko, Andrej Ridzik and Jakub Rajčani
P13: Improvements to Prosodic Variation in Long Short-Term Memory based Intonation Models Using Random Forest
Bálint Pál Tóth, Balázs Szórádi and Géza Németh
P14: Fusing various audio feature sets for detection of Parkinson's disease from sustained voice and speech recordings
Evaldas Vaiciukynas, Antanas Verikas, Adas Gelzinis, Marija Bacauskiene, Kestutis Vaskevicius, Virgilijus Uloza, Evaldas Padervinskis and Jolita Ciceliene
P15: Investigation of Speech Signal Parameters Reflecting the Truth of Transmitted Information
Victor Budkov, Irina Vatamaniuk, Vladimir Basov and Daniyar Volf
P16: Trade-off between speed and accuracy for Noise Variance Minimization (NVM) pitch estimation algorithm
Andrey Barabanov and Aleksandr Melnikov
P17: Study on the improvement of intelligibility for elderly speech using formant frequency shift method
Yuto Tanaka, Mitsunori Mizumachi and Yoshihisa Nakatoh
P18: Quality Assessment of two Fullband Audio Codecs Supporting Real-Time Communication
Michael Maruschke, Oliver Jokisch, Martin Meszaros, Franziska Trojahn and Mario Hoffmann
P19: A Deep Neural Networks (DNN) Based models for a Computer Aided Pronunciation Learning System
Mohamed Elaraby, Mustafa Abdallah, Sherif Abdou and Mohsen Rashwan (in absentia)
P20: Evaluation of Response Times on a Touch Screen using Stereo Panned Speech Command Auditory Feedback
Hunor Nagy and György Wersényi
P21: Speech Enhancement with Microphone Array Using a Multi Beam Adaptive Noise Suppressor
Mikhail Stolbov and Alexander Lavrentyev
P22: Microphone Array Directivity Improvement in Low-Frequency Domain for Speech Processing
Sergei Aleinik and Mikhail Stolbov
P23: Optimization of Zelinski post-filtering calculation
Sergei Aleinik
P24: Assessment of the relation between low-frequency features and velum opening by using real articulatory data
Alexander Sepulveda-Sepulveda and German Castellanos-Dominguez
P25: Evaluation of the speech quality during rehabilitation after surgical treatment of the cancer of oral cavity and oropharynx based on a comparison of the Fourier spectra
Evgeny Kostyuchenko, Roman Mescheryakov, Dariya Ignatieva, Alexander Pyatkov, Evgeny Choynzonov and Lidiya Batatskaya

16:00-16:30 Coffee break

16:30-18:30 Speech synthesis
Chair: Géza Németh, Budapest University of Technology and Economics, Hungary

16:30-16:50 Ensemble Deep Neural Network based Waveform-Driven Stress Model for Speech Synthesis 
Bálint Pál Tóth, Kornél István Kiss, György Szaszák and Géza Németh
16:50-17:10 DNN-Based Duration Modeling for Synthesizing Short Sentences 
Péter Nagy and Géza Németh
17:10-17:30 Experiments with One-Class Classifier as a Predictor of Spectral Discontinuities in Unit Concatenation 
Daniel Tihelka, Martin Grůber and Markéta Jůzová
17:30-17:50 Phonetic Aspects of High Level of Naturalness in Speech Synthesis 
Vera Evdokimova, Pavel Skrelin, Andrey Barabanov and Karina Evgrafova
17:50-18:10 An agonist-antagonist pitch production model 
Branislav Gerazov and Philip N. Garner
18:10-18:30 An UMP (Universal Melodic Portraits) Model of Pitch Contours Stylization for Analysis and Synthesis of Intonation
Boris Lobanov

Thursday, August, 25th

09:00-10:00 Keynote speech: Speech Recognition Challenges in the Car Navigation Industry 
Attila Vékony, NNG Software Developing and Commercial Llc. Hungary
Chair: Andrey Ronzhin, SPIIRAS, Russia

10:00-10:30 Coffee break

10:30-12:30 Multimodal human-machine interaction
Chair: Milos Zelezny, University of West Bohemia, Czech Republic

10:30-10:50 Toward Sign Language Motion Capture Dataset Building
Zdeněk Krňoul, Pavel Jedlička, Jakub Kanis and Milos Zelezny
10:50-11:10 Selecting Keypoint Detector and Descriptor Combination for Augmented Reality Application 
Lukáš Bureš and Luděk Müller
11:10-11:30 Human-Robot Interaction using Brain-Computer Interface
Lev Stankevich and Konstantin Sonkin
11:30-11:50 Attention Training Game with Aldebaran Robotics NAO and Brain-Computer Interface 
Evgeny Shandarov, Stepan Gomilko and Alina Zimina
11:50-12:10 HAVRUS Corpus: High-speed Recordings of Audio-Visual Russian Speech 
Vasilisa Verkhodanova, Alexander Ronzhin, Irina Kipyatkova, Denis Ivanko, Alexey Karpov and Milos Zelezny
12:10-12:30 Speech Recognition combining MFCCs and Image Features (Skype) 
Stamatis Karlos, Nikos Fazakis, Katerina Karanikola, Sotiris Kotsiantis and Kyriakos Sgarbas

12:30-14:00 Lunch

14:00-16:00 ICR Poster session
Chair: Eugene Larkin, Tula State University, Russia

P1: Decentralized Approach to Control of Robot Groups During Execution of the Task Flow
Igor Kalyaev, Anatoly Kalyaev and Iakov Korovin
P2: A Recovery Method for the Robotic Decentralized Control System with Performance Redundancy
Iakov Korovin, Eduard Melnik and Anna Klimenko
P3: Control Algorithms for Heterogeneous Vehicle Groups Control in Obstructed 2-D Environments
Viacheslav Pshikhopov, Mikhail Medvedev, Anatoly Gaiduk and Aleksandr Kolesnikov
P4: Method of Spheres for Solving 3D Formation Task in a Group of Quadrotors
Donat Ivanov, Sergey Kapustyan and Igor Kalyaev
P5: Multi-Robot Exploration and Mapping Based on the Subdefinite Models
Valery Karpov, Alexander Migalev, Anton Moscowsky, Maxim Rovbo and Vitaly Vorobiev
P6: Simulation of Commands Execution by Mobile Robot
Eugene Larkin, Alexey Ivutin, Vladislav Kotov and Alexander Privalov
P7: The Effectiveness of Rescuing Casualties when Using Robotic Systems
Anna Motienko, Igor Dorozhko, Anatoly Tarasov and Oleg Basov
P8: Distributed Information System for Collaborative Robots and IoT Devices
Siarhei Herasiuta, Uladzislau Sychou and Ryhor Prakapovich
P9: Positioning Method Basing on External Reference Points for Surgical Robots
Ekaterina Sinyavskaya, Elena Shestova, Mikhail Medvedev and Evgenij Kosenko
P10: Hardware-Software Solution for Three-Dimensional Model Control in Volumetric Display Testing Unit for Visualization and Dispatching Applications
Alexander Bolshakov, Arthur Sgibnev, Tatiana Chistyakova, Viktor Glazkov and Dmitry Lachugin P11: Educational Marine Robotics in SMTU
Mikhail Chemodanov, Ryzhov Vladimir, Nickolay Semenov, Kirill Rozhdestvensky and Igor Kozhemyakin
P12: Designing Simulation Model of Humanoid Robot to Study Servo Control System Alexander Denisov, Viktor Budkov and Daniil Mikhalchenko
P13: Speech Dialog as a Part of Interactive "Human-Machine" Systems
Rodmonga Potapova
P14: Human-Machine Speech-Based Interfaces with Augmented Reality and Interactive Systems for Controlling Mobile Cranes
Maciej J. Majewski and Wojciech Kacalak
P15: Preprocessing Data for Facial Gestures Classifier on the Basis of the Neural Network Analysis of Biopotentials Muscle Signals
Raisa Budko and Irina Starchenko
P16: Mimic Recognition and Reproduction in Bilateral Human-Robot Speech Communication
Arkady S. Yuschenko, Sergey Vorotnikov, Dmitry Konyshev and Andrey Zhonin
P17: Interactive Collaborative Robotics and Natural Language Interface Based on Multi-Agent Recursive Cognitive Architectures
Murat Anchokov, Zalimkhan Nagoev, Vladimir Denisenko, Boris Tazhev and Zaurbek Sundukov P18: An Analysis of Visual Faces Datasets
Ivan Gruber, Miroslav Hlaváč, Marek Hrúz, Miloš Železný and Alexey Karpov
P19: Voice Dialogue with a Collaborative Robot Driven by Multimodal Semantics
Alexander Kharlamov and Konstantin Ermishin
P20: Human-Smartphone Interaction for Dangerous Situation Detection & Recommendation Generation while Driving
Alexander Smirnov, Alexey Kashevnik and Igor Lashkov
P21: Conceptual Model of Cyberphysical Environment Based on Collaborative Work of Distributed Means and Mobile Robots
Anton Saveliev, Oleg Basov and Andrey Ronzhin
P22: The Humanoid Robot Assistant for a Preschool Children
Evgeny Shandarov, Alina Zimina, Dmitry Rimer, Evgenia Sokolova and Olga Shandarova

16:00-16:30 Coffee break

16:30-18:30 Interactive collaborative robotics
Chair: Roman Meshcheryakov, TUSUR, Russia

16:30-16:50 Development of Wireless Charging Robot for Indoor Environment based on Probabilistic Roadmap 
Yi-Shiun Wu, Chi-Wei Chen and Hooman Samani
16:50-17:10 Mechanical Leg Design of the Anthropomorphic Robot Antares 
Nikita Pavluk, Victor Budkov, Andrey Kodyakov and Andrey Ronzhin
17:10-17:30 YuMi, come and play with me! A Collaborative Robot for piecing together a Tangram Puzzle 
David Kirschner, Rosemarie Velik, Saeed Yahyanejad, Mathias Brandstötter and Michael Hofbaur
17:30-17:50 A Control Strategy for a Lower Limb Exoskeleton with a Toe Joint 
Sergei Savin, Sergey Jatsun and Andrey Yatsun
17:50-18:10 Robot Soccer Team for RoboCup Humanoid KidSize League 
Evgeny Shandarov, Stepan Gomilko, Darya Zhulaeva, Dmitry Rimer, Dmitry Yakushin and Roman Meshcheryakov
18:10-18:30 Smart M3-Based Robot Interaction Scenario for Coalition Work 
Alexander Smirnov, Alexey Kashevnik, Sergey Mikhailov, Mikhail Mironov and Mikhail Petrov

16:30-18:30 Speech signal processing
Chair: László Tóth, University of Szeged

16:30-16:50 Robust Speech Analysis Based on Source-Filter Model Using Multivariate Empirical Mode Decomposition in Noisy Environments 
Surasak Boonkla, Masashi Unoki and Stanislav S. Makhanov
16:50-17:10 An Algorithm for Phase Manipulation in a Speech Signal 
Darko Pekar, Siniša Suzić, Robert Mak, Meir Friedlander and Milan Sečujski
17:10-17:30 Detecting Laughter and Filler Events by Time Series Smoothing with Genetic Algorithms 
Gábor Gosztolya
17:30-17:50 Bio-Inspired Sparse Representation of Speech and Audio Using Psychoacoustic Adaptive Matching Pursuit 
Alexey Petrovsky, Vadzim Herasimovich and Alexander Petrovsky
17:50-18:10 Statistical analysis of acoustical parameters in the voice of children with juvenile dysphonia 
Miklós Gábriel Tulics, Ferenc Kazinczi and Klára Vicsi
18:10-18:30 Precise estimation of harmonic parameter trend and modification of a speech signal
Andrey Barabanov, Evgenij Vikulov and Valentin Magerkin

19:30-21:30 Gala dinner on the Danube

Friday, August, 26th

09:00-10:00 Keynote speech: Machine Processing of Dialogue States; Speculations on Conversational Entropy 
Nick Campbell, Trinity College Dublin, Ireland
Chair: Rodmonga Potapova, MSLU, Russia

10:00-10:30 Coffee break

10:30-12:30 Natural language processing
Chair: Rodmonga Potapova, MSLU, Russia

10:30-10:50 Text Classification in the Domain of Applied Linguistics as Part of a Pre-editing Module for Machine Translation Systems
Ksenia Oskina
10:50-11:10 Backchanneling via Twitter Data for Conversational Dialogue Systems 
Michimasa Inaba and Kenichi Takahasi
11:10-11:30 Measuring prosodic entrainment in Italian collaborative game-based dialogues 
Michelina Savino, Loredana Lapertosa, Alessandro Caffò and Mario Refice
11:30-11:50 A Preliminary Exploration of Group Social Engagement Level Recognition in Multiparty Casual Conversation 
Yuyun Huang, Emer Gilmartin, Benjamin R. Cowan and Nick Campbell
11:50-12:10 Interaction Quality as a Human-Human Task-Oriented Conversation Performance (ppsx)

Anastasiia Spirina, Olesia Vaskovskaia, Maxim Sidorov and Alexander Schmitt
12:10-12:30 A comparison of acoustic features of speech of typically developing children and children with autism spectrum disorders 
Elena Lyakso, Olga Frolova and Aleksey Grigorev

12:30-14:00 Lunch

SPECOM Poster session II
14:00-16:00 Chair: Nick Campbell, Trinity College Dublin, Ireland

P1: Polybasic Attribution of Social Network Discourse
Rodmonga Potapova and Vsevolod Potapov
P2: Detecting Filled Pauses and Lengthenings in Russian Spontaneous Speech using SVM
Vasilisa Verkhodanova and Vladimir Shapranov
P3: Multimodal Perception of Aggressive Behavior
Rodmonga Potapova and Liliya Komalova
P4: Designing High-Coverage Multi-Level Text Corpus for Non-Professional-Voice Conservation
Markéta Jůzová, Daniel Tihelka and Jindřich Matoušek
P5: A Linguistic Interpretation of the Atom Decomposition of Fundamental Frequency Contour for American English
Tijana Delić, Branislav Gerazov, Branislav Popović and Milan Sečujski
P6: Emotional speech of 3-years old children: norm-risk-deprivation
Olga Frolova and Elena Lyakso
P7: Profiling a Set of Personality Traits of a Text's Author: a Corpus-Based Approach
Tatiana Litvinova, Olga Zagorovskaya, Olga Litvinova and Pavel Seredin
P8: Unsupervised trained functional discourse parser for e-learning materials scaffolding
Varvara Krayvanova and Svetlana Duka
P9: Low Inter-Annotator Agreement in Sentence Boundary Detection and Personality
Anton Stepikhov and Anastassia Loukina
P10: Modeling Imperative Utterances in Russian Spoken Dialogue: Verb-Central Quantitative Approach
Olga Blinova
P11: An Exploratory Study on Sociolinguistic Variation of Spoken Russian
Natalia Bogdanova-Beglarian, Tatiana Sherstinova, Olga Blinova and Gregory Martynenko
P12: Speech Acts Annotation of Everyday Conversations in the ORD corpus of Spoken Russian
Tatiana Sherstinova
P13: Design of a Speech Corpus for Research on Cross-Lingual Prosody Transfer
Milan Sečujski, Branislav Gerazov, Tamás Gábor Csapó, Vlado Delić, Philip Garner, Aleksandar Gjoreski, David Guennec, Zoran Ivanovski, Aleksandar Melov, Géza Németh, Ana Stojković and György Szaszák
P14: Sociolinguistic Extension of the ORD Corpus of Russian Everyday Speech
Natalia Bogdanova-Beglarian, Tatiana Sherstinova, Olga Blinova, Olga Ermolova, Ekaterina Baeva, Gregory Martynenko and Anastasia Ryko
P15: Detecting state of aggression in sentences using CNN
Denis Gordeev
P16: Tonal Specification of Perceptually Prominent Non-Nuclear Pitch Accents in Russian
Nina Volskaya and Tatiana Kachkovskaia
P17: Lexical Stress in Punjabi and its Representation in PLS
Swaran Lata, Swati Arora and Simerjeet Kaur
P18: Comparative analysis of classifiers for automatic language recognition in spontaneous speech
Konstantin Simonchik, Sergey Novoselov and Galina Lavrentyeva
P19: Semi-automatic Speaker Verification System Based on Analysis of Formant, Durational and Pitch Characteristics
Elena Bulgakova and Aleksei Sholohov
P20: Scores Calibration in Speaker Recognition Systems
Andrey Shulipa, Sergey Novoselov and Yuri Matveev
P21: Speech Features Evaluation for Small Set Automatic Speaker Verification Using GMM-UBM System
Ivan Rakhmanenko and Roman Meshcheryakov
P22: Approaches for Out-of-Domain Adaptation to Improve Speaker Recognition Performance Andrey Shulipa, Sergey Novoselov and Aleksandr Melnikov
P23: Prosody Analysis of Malay Language Storytelling Corpus
Izzad Ramli, Noraini Seman, Norizah Ardi and Nursuriati Jamil
P24: Finding speaker position under difficult acoustic conditions
Evgeniy Shuranov, Alexander Lavrentyev, Alexey Kozlyaev and Valeriya Volkovaya
P25: Scenarios of Multimodal Information Navigation Services for Users in Cyberphysical Environment
Irina Vatamaniuk, Dmitriy Levonevskiy, Anton Saveliev and Alexander Denisov

16:00-16:30 Coffee break

16:30-18:30 Speaker and language recognition
Chair: Iosif Mporas, University of Hertfordshire, UK

16:30-16:50 Investigation of Segmentation in i-Vector based Speaker Diarization of Telephone Speech 
Zbynek Zajic, Marie Kunesova and Vlasta Radova
16:50-17:10 Improving Robustness of Speaker Verification by Fusion of Prompted Text-Dependent & Text- Independent Operation Modalities 
Iosif Mporas, Saeid Safavi and Reza Sotudeh
17:10-17:30 Convolutional Neural Network in the Task of Speaker Change Detection 
Marek Hruz and Marie Kunesova
17:30-17:50 Online Biometric Identification With Face Analysis in Web Applications 
Gerasimos Arvanitis, Konstantinos Moustakas and Nikos Fakotakis
17:50-18:10 Language Identification using Time Delay Neural Network D-Vector on Short Utterances
Maxim Tkachenko, Alexander Yamshinin, Nikolay Luibimov, Mikhail Kotov and Marina Nastasenko
18:10-18:30 On Individual Polyinformativity of Speech and Voice Regarding Speaker's Auditive Attribution (Forensic Phonetic Aspect) 
Rodmonga Potapova and Vsevolod Potapov

18:30-18:40 Closing ceremony 

Saturday, August, 27th

09:00-15:00 Budapest tour