INTERSPEECH2021

An Introduction to Automatic Differentiation with Weighted Finite-State Automata

INTERSPEECH2021

An Introduction to Automatic Differentiation with Weighted Finite-State Automata

2:59:31

Neural target speech extraction

INTERSPEECH2021

Neural target speech extraction

3:00:29

Concept to Code: Semi-Supervised End-To-End Approaches For Speech Recognition

INTERSPEECH2021

Concept to Code: Semi-Supervised End-To-End Approaches For Speech Recognition

2:23:05

Intonation Transcription and Modelling in Research and Speech Technology Applications

INTERSPEECH2021

Intonation Transcription and Modelling in Research and Speech Technology Applications

2:50:58

SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit

INTERSPEECH2021

SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit

1:18:38

SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit

INTERSPEECH2021

SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit

45:23

SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit

INTERSPEECH2021

SpeechBrain: Unifying Speech Technologies and Deep Learning With an Open Source Toolkit

33:47

Speech Recognition with Next-Generation Kaldi (K2, Lhotse, Icefall)

INTERSPEECH2021

Speech Recognition with Next-Generation Kaldi (K2, Lhotse, Icefall)

3:04:22

Language Modeling and Artificial Intelligence

INTERSPEECH2021

Language Modeling and Artificial Intelligence

1:02:31

Learning speech models from multi-modal data

INTERSPEECH2021

Learning speech models from multi-modal data

1:00:15

Towards automatic speech recognition for people with atypical speech

INTERSPEECH2021

Towards automatic speech recognition for people with atypical speech

1:04:17

Opening ceremony

INTERSPEECH2021

Opening ceremony

1:15:02

Child Language Acquisition studied with Wearables

INTERSPEECH2021

Child Language Acquisition studied with Wearables

58:22

Uncovering the acoustic cues of COVID-19 infection

INTERSPEECH2021

Uncovering the acoustic cues of COVID-19 infection

1:03:19

Ethical and Technological Challenges of Conversational AI

INTERSPEECH2021

Ethical and Technological Challenges of Conversational AI

51:15

Adaptive listening to everyday soundscapes

INTERSPEECH2021

Adaptive listening to everyday soundscapes

1:02:36

ISCA Medalist: Forty years of speech and language processing: from Bayes decision rule to deep l...

INTERSPEECH2021

ISCA Medalist: Forty years of speech and language processing: from Bayes decision rule to deep l...

58:26

Web Interface for estimating articulatory movements in speech production from acoustics and text...

INTERSPEECH2021

Web Interface for estimating articulatory movements in speech production from acoustics and text...

3:19

WittyKiddy: Multilingual Spoken Language Learning for Kids - (3 minutes introduction)

INTERSPEECH2021

WittyKiddy: Multilingual Spoken Language Learning for Kids - (3 minutes introduction)

3:45

NeMo (Inverse) Text Normalization: From Development To Production - (longer introduction)

INTERSPEECH2021

NeMo (Inverse) Text Normalization: From Development To Production - (longer introduction)

12:58

Automatic Radiology Report Editing through Voice - (3 minutes introduction)

INTERSPEECH2021

Automatic Radiology Report Editing through Voice - (3 minutes introduction)

2:37

NeMo (Inverse) Text Normalization: From Development To Production - (3 minutes introduction)

INTERSPEECH2021

NeMo (Inverse) Text Normalization: From Development To Production - (3 minutes introduction)

3:17

Save your Voice: Voice Banking and TTS for Anyone - (3 minutes introduction)

INTERSPEECH2021

Save your Voice: Voice Banking and TTS for Anyone - (3 minutes introduction)

3:19

Interactive and real-time acoustic measurement tools for speech data acquisition and presentatio...

INTERSPEECH2021

Interactive and real-time acoustic measurement tools for speech data acquisition and presentatio...

2:28

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech - (Oral presentation)

INTERSPEECH2021

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech - (Oral presentation)

1:23

F-T-LSTM based Complex Network for Joint Acoustic Echo Cancellation and Speech Enhancement - (Or...

INTERSPEECH2021

F-T-LSTM based Complex Network for Joint Acoustic Echo Cancellation and Speech Enhancement - (Or...

16:09

Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility O...

INTERSPEECH2021

Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility O...

1:28

Disordered Speech Data Collection: Lessons Learned at 1 Million Utterances from Project Euphonia...

INTERSPEECH2021

Disordered Speech Data Collection: Lessons Learned at 1 Million Utterances from Project Euphonia...

1:29

Conformer Parrotron: a Faster and Stronger End-to-end SpeechConversion and Recognition Model for...

INTERSPEECH2021

Conformer Parrotron: a Faster and Stronger End-to-end SpeechConversion and Recognition Model for...

1:14

A Voice-Activated Switch for Persons with Motor and Speech Impairments: Isolated-Vowel Spotting ...

INTERSPEECH2021

A Voice-Activated Switch for Persons with Motor and Speech Impairments: Isolated-Vowel Spotting ...

1:20

Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and ...

INTERSPEECH2021

Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and ...

1:22

Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition - (Oral pr...

INTERSPEECH2021

Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition - (Oral pr...

1:21

Adversarial Data Augmentation for Disordered Speech Recognition - (Oral presentation)

INTERSPEECH2021

Adversarial Data Augmentation for Disordered Speech Recognition - (Oral presentation)

1:21

INTERSPEECH 2021 Acoustic Echo Cancellation Challenge - (Oral presentation)

INTERSPEECH2021

INTERSPEECH 2021 Acoustic Echo Cancellation Challenge - (Oral presentation)

18:53

Handling acoustic variation in dysarthric speech recognition systems through model combination -...

INTERSPEECH2021

Handling acoustic variation in dysarthric speech recognition systems through model combination -...

1:08

Acoustic Echo Cancellation using Deep Complex Neural Network with Nonlinear Magnitude Compressio...

INTERSPEECH2021

Acoustic Echo Cancellation using Deep Complex Neural Network with Nonlinear Magnitude Compressio...

16:31

Investigating the Utility of Multimodal Conversational Technology and Audiovisual Analytic Measu...

INTERSPEECH2021

Investigating the Utility of Multimodal Conversational Technology and Audiovisual Analytic Measu...

1:21

Automatic Speech Recognition of Disordered Speech: Personalized models outperforming human liste...

INTERSPEECH2021

Automatic Speech Recognition of Disordered Speech: Personalized models outperforming human liste...

1:32

Factorization-Aware Training of Transformers for Natural Language Understanding On the Edge - (3...

INTERSPEECH2021

Factorization-Aware Training of Transformers for Natural Language Understanding On the Edge - (3...

3:22

END-to-END Cross-Lingual Spoken Language Understanding Model with Multilingual Pretraining - (3 ...

INTERSPEECH2021

END-to-END Cross-Lingual Spoken Language Understanding Model with Multilingual Pretraining - (3 ...

2:52

Augmenting Slot Values and Contexts for Spoken Language Understanding with Pretrained Models - (...

INTERSPEECH2021

Augmenting Slot Values and Contexts for Spoken Language Understanding with Pretrained Models - (...

3:19

Synthesis of expressive speaking styles with limited training data in a multi-speaker, prosody-c...

INTERSPEECH2021

Synthesis of expressive speaking styles with limited training data in a multi-speaker, prosody-c...

2:08

Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis - (3 minutes int...

INTERSPEECH2021

Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis - (3 minutes int...

3:17

SponSpeech: Adaptive Text to Speech for Spontaneous Style - (3 minutes introduction)

INTERSPEECH2021

SponSpeech: Adaptive Text to Speech for Spontaneous Style - (3 minutes introduction)

3:24

Presentation matters: Evaluating speaker identification tasks - (longer introduction)

INTERSPEECH2021

Presentation matters: Evaluating speaker identification tasks - (longer introduction)

13:34

Towards Multi-Scale Style Control for Expressive Speech Synthesis - (3 minutes introduction)

INTERSPEECH2021

Towards Multi-Scale Style Control for Expressive Speech Synthesis - (3 minutes introduction)

3:20

Expressive Text-to-Speech using Style Tag - (3 minutes introduction)

INTERSPEECH2021

Expressive Text-to-Speech using Style Tag - (3 minutes introduction)

3:19

Controllable Context-Aware Conversational Speech Synthesis - (3 minutes introduction)

INTERSPEECH2021

Controllable Context-Aware Conversational Speech Synthesis - (3 minutes introduction)

3:23

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressi...

INTERSPEECH2021

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressi...

3:06

An Integrated Framework for Two-pass Personalized Voice Trigger - (3 minutes introduction)

INTERSPEECH2021

An Integrated Framework for Two-pass Personalized Voice Trigger - (3 minutes introduction)

3:19

Automatic Error Correction for Speaker Embedding Learning with Noisy Labels - (3 minutes introdu...

INTERSPEECH2021

Automatic Error Correction for Speaker Embedding Learning with Noisy Labels - (3 minutes introdu...

3:13

Presentation matters: Evaluating speaker identification tasks - (3 minutes introduction)

INTERSPEECH2021

Presentation matters: Evaluating speaker identification tasks - (3 minutes introduction)

3:21

Chronological Self-Training for Real-Time Speaker Diarization - (3 minutes introduction)

INTERSPEECH2021

Chronological Self-Training for Real-Time Speaker Diarization - (3 minutes introduction)

3:01

Multi-Channel Speaker Verification for Single and Multi-talker Speech - (3 minutes introduction)...

INTERSPEECH2021

Multi-Channel Speaker Verification for Single and Multi-talker Speech - (3 minutes introduction)...

3:18

Dr-Vectors: Decision Residual Networks and an Improved Loss for Speaker Recognition - (3 minutes...

INTERSPEECH2021

Dr-Vectors: Decision Residual Networks and an Improved Loss for Speaker Recognition - (3 minutes...

3:21

Fusion of Embeddings Networks for Robust Combination of Text Dependent and Independent Speaker R...

INTERSPEECH2021

Fusion of Embeddings Networks for Robust Combination of Text Dependent and Independent Speaker R...

3:29

Collaborative Training of Acoustic Encoders for Speech Recognition - (3 minutes introduction)

INTERSPEECH2021

Collaborative Training of Acoustic Encoders for Speech Recognition - (3 minutes introduction)

3:21

Graph-based Label Propagation for Semi-Supervised Speaker Identification - (3 minutes introducti...

INTERSPEECH2021

Graph-based Label Propagation for Semi-Supervised Speaker Identification - (3 minutes introducti...

3:08

Weakly Supervised Construction of ASR Systems from Massive Video Data - (longer introduction)

INTERSPEECH2021

Weakly Supervised Construction of ASR Systems from Massive Video Data - (longer introduction)

10:46

PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation - (3 minutes introd...

INTERSPEECH2021

PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation - (3 minutes introd...

3:09

Tied & Reduced RNN-T Decoder - (3 minutes introduction)

INTERSPEECH2021

Tied & Reduced RNN-T Decoder - (3 minutes introduction)

3:09

Extremely Low Footprint End-to-End ASR System for Smart Device - (3 minutes introduction)

INTERSPEECH2021

Extremely Low Footprint End-to-End ASR System for Smart Device - (3 minutes introduction)

3:14

Compressing 1D Time-Channel Separable Convolutions using Sparse Random Ternary Matrices - (longe...

INTERSPEECH2021

Compressing 1D Time-Channel Separable Convolutions using Sparse Random Ternary Matrices - (longe...

15:13

Weakly Supervised Construction of ASR Systems from Massive Video Data - (3 minutes introduction)...

INTERSPEECH2021

Weakly Supervised Construction of ASR Systems from Massive Video Data - (3 minutes introduction)...

2:35

Compressing 1D Time-Channel Separable Convolutions using Sparse Random Ternary Matrices - (3 min...

INTERSPEECH2021

Compressing 1D Time-Channel Separable Convolutions using Sparse Random Ternary Matrices - (3 min...

3:17

Generalized Dilated CNN Models for Depression Detection Using Inverted Vocal Tract Variables - (...

INTERSPEECH2021

Generalized Dilated CNN Models for Depression Detection Using Inverted Vocal Tract Variables - (...

3:20

Speech Emotion Recognition with Multi-task Learning - (3 minutes introduction)

INTERSPEECH2021

Speech Emotion Recognition with Multi-task Learning - (3 minutes introduction)

3:18

Metric Learning Based Feature Representation With Gated Fusion Model For Speech Emotion Recognit...

INTERSPEECH2021

Metric Learning Based Feature Representation With Gated Fusion Model For Speech Emotion Recognit...

2:58

Audio-Visual Speech Emotion Recognition by Disentangling Emotion and Identity Attributes - (3 mi...

INTERSPEECH2021

Audio-Visual Speech Emotion Recognition by Disentangling Emotion and Identity Attributes - (3 mi...

3:20

Improvement of Automatic English Pronunciation Assessment with Small Number of Utterances Using ...

INTERSPEECH2021

Improvement of Automatic English Pronunciation Assessment with Small Number of Utterances Using ...

3:11

NeMo Inverse Text Normalization: From Development To Production - (3 minutes introduction)

INTERSPEECH2021

NeMo Inverse Text Normalization: From Development To Production - (3 minutes introduction)

2:49

"You don't understand me!": Comparing ASR results for L1 and L2 speakers of Swedish - (3 minutes...

INTERSPEECH2021

"You don't understand me!": Comparing ASR results for L1 and L2 speakers of Swedish - (3 minutes...

3:21

Deep feature transfer learning for automatic pronunciation assessment - (3 minutes introduction)...

INTERSPEECH2021

Deep feature transfer learning for automatic pronunciation assessment - (3 minutes introduction)...

3:01

Lexical Density Analysis of Word Productions in Japanese English Using Acoustic Word Embeddings ...

INTERSPEECH2021

Lexical Density Analysis of Word Productions in Japanese English Using Acoustic Word Embeddings ...

3:29

Explore Wav2vec 2.0 for Mispronunciation Detection - (3 minutes introduction)

INTERSPEECH2021

Explore Wav2vec 2.0 for Mispronunciation Detection - (3 minutes introduction)

3:14

Toward Genre Adapted Close Captioning - (Oral presentation)

INTERSPEECH2021

Toward Genre Adapted Close Captioning - (Oral presentation)

18:39

End-to-End Speaker-Attributed ASR with Transformer - (3 minutes introduction)

INTERSPEECH2021

End-to-End Speaker-Attributed ASR with Transformer - (3 minutes introduction)

3:03

Weakly-supervised word-level pronunciation error detection in non-native English speech - (longe...

INTERSPEECH2021

Weakly-supervised word-level pronunciation error detection in non-native English speech - (longe...

2:52

EML Online Speech Activity Detection for Fearless Steps Challenge Phase-III - (Oral presentation...

INTERSPEECH2021

EML Online Speech Activity Detection for Fearless Steps Challenge Phase-III - (Oral presentation...

18:50

Spoken Term Detection and Relevance Score Estimation using Dot-Product of Pronunciation Embeddin...

INTERSPEECH2021

Spoken Term Detection and Relevance Score Estimation using Dot-Product of Pronunciation Embeddin...

18:54

Voice Activity Detection With Teacher-Student Domain Emulation - (Oral presentation)

INTERSPEECH2021

Voice Activity Detection With Teacher-Student Domain Emulation - (Oral presentation)

21:01

Semantic sentence similarity: size does not always matter - (Oral presentation)

INTERSPEECH2021

Semantic sentence similarity: size does not always matter - (Oral presentation)

18:22

Speech Activity Detection Based on Multilingual Speech Recognition System - (Oral presentation)...

INTERSPEECH2021

Speech Activity Detection Based on Multilingual Speech Recognition System - (Oral presentation)...

20:08

The Application of Learnable STRF Kernels to the 2021 Fearless Steps Phase-03 SAD Challenge - (O...

INTERSPEECH2021

The Application of Learnable STRF Kernels to the 2021 Fearless Steps Phase-03 SAD Challenge - (O...

19:56

Unsupervised Representation Learning for Speech Activity Detection in the Fearless Steps Challen...

INTERSPEECH2021

Unsupervised Representation Learning for Speech Activity Detection in the Fearless Steps Challen...

18:45

Combining Hybrid and End-to-end Approaches for the OpenASR20 Challenge - (Oral presentation)

INTERSPEECH2021

Combining Hybrid and End-to-end Approaches for the OpenASR20 Challenge - (Oral presentation)

3:18

The TNT Team System Descriptions of Cantonese and Mongolian for IARPA OpenASR20 - (Oral presenta...

INTERSPEECH2021

The TNT Team System Descriptions of Cantonese and Mongolian for IARPA OpenASR20 - (Oral presenta...

3:13

An Empirical Study on Channel Effects for Synthetic Voice Spoofing Countermeasure Systems - (lon...

INTERSPEECH2021

An Empirical Study on Channel Effects for Synthetic Voice Spoofing Countermeasure Systems - (lon...

10:54

Systems for Low-Resource Speech Recognition Tasks in Open Automatic Speech Recognition and Formo...

INTERSPEECH2021

Systems for Low-Resource Speech Recognition Tasks in Open Automatic Speech Recognition and Formo...

3:35

An Empirical Study on Channel Effects for Synthetic Voice Spoofing Countermeasure Systems - (3 m...

INTERSPEECH2021

An Empirical Study on Channel Effects for Synthetic Voice Spoofing Countermeasure Systems - (3 m...

3:19

Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks - (3 minutes in...

INTERSPEECH2021

Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks - (3 minutes in...

3:10

Representation Learning to Classify and Detect Adversarial Attacks against Speaker and Speech Re...

INTERSPEECH2021

Representation Learning to Classify and Detect Adversarial Attacks against Speaker and Speech Re...

3:22

Pairing Weak with Strong: Twin Models for Defending against Adversarial Attack on Speaker Verifi...

INTERSPEECH2021

Pairing Weak with Strong: Twin Models for Defending against Adversarial Attack on Speaker Verifi...

12:29

Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-...

INTERSPEECH2021

Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-...

3:25

Voting for the right answer: Adversarial defense for speaker verification - (3 minutes introduct...

INTERSPEECH2021

Voting for the right answer: Adversarial defense for speaker verification - (3 minutes introduct...

3:02

Pairing Weak with Strong: Twin Models for Defending against Adversarial Attack on Speaker Verifi...

INTERSPEECH2021

Pairing Weak with Strong: Twin Models for Defending against Adversarial Attack on Speaker Verifi...

2:59

A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection - (...

INTERSPEECH2021

A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection - (...

3:09

Cross-database replay detection in terminal-dependent speaker verification - (3 minutes introduc...

INTERSPEECH2021

Cross-database replay detection in terminal-dependent speaker verification - (3 minutes introduc...

2:56

An Initial Investigation for Detecting Partially Spoofed Audio - (3 minutes introduction)

INTERSPEECH2021

An Initial Investigation for Detecting Partially Spoofed Audio - (3 minutes introduction)

3:07

Keyword Transformer: A Self-Attention Model for Keyword Spotting - (3 minutes introduction)

INTERSPEECH2021

Keyword Transformer: A Self-Attention Model for Keyword Spotting - (3 minutes introduction)

3:16

次のページ