Faculty
Dr. V. Ramasubramanian, Professor
IIIT Bangalore Home-page: https://www.iiitb.ac.in/faculty/v-ramasubramanian
Research students (ongoing)
Ms. Vandana Ladwani, 2015-, (Ph.D.), "Hopfield auto-associative memory formulations for information storage, retrieval and sequence decoding".
Vikram R. Lakkavalli, 2017-, (Ph.D.), "Analysis-by-synthesis (AbS) frameworks for ASR".
Ms. Dhanya Eledath, 2018-, (Ph.D.), "Few-Shot Learning for E2E Automatic Speech Recognition".
Ms. Jhelum Das, 2018-, (Ph.D.), "Auto-associative memory formulations for multi-modal learning".
Tirthankar Banerjee, 2023-, (Ph.D.) "Few-shot Learning for Cross-domain E2E ASR".
Research students (completed)
K. E. Manjuanth, 2020, Ph.D. Thesis, "Study of Multilingual Phone Recognition Using Indian Languages'", Prof. Ravi Ravindran Gold Medal for Best Ph.D. Thesis (2020).
Shreekantha Nadig, 2021, MS Thesis, "Multi-task Learning in End-to-end Attention-based Automatic Speech Recognition".
K. M. Srinivasa Raghavan, 2022, MS Thesis, "Unsupervised Representation Learning for Low-resource Automatic Speech Recognition".
Tirthankar Banerjee, 2022, MS Thesis, "Application of Few-Shot Learning for Speech Recognition".
Faculty Profile
Dr. V. Ramasubramanian, Professor (v.ramasubramanian@iiitb.ac.in), Ph.D. (TIFR Bombay).
Ramasubramanian obtained his B.S. degree from University of Madras in 1981, B.E. degree from Indian Institute of Science, Bangalore in 1984 and the Ph.D. degree from Tata Institute of Fundamental Research (TIFR), Bombay in 1992. He has been engaged in research in speech processing and related areas for nearly 3 decades. He has over 70 research publications in these areas in peer reviewed international journals and conferences.
He has worked in various institutions and universities, such as TIFR, Bombay (1984-99) as Research Scholar, Fellow and Reader; University of Valencia, Spain as Visiting Scientist (1991-92); Advanced Telecommunications Research (ATR) Laboratories, Kyoto, Japan as Invited Researcher (1996-97); Indian Institute of Science (IISc), Bangalore as Research Associate (2000-04) and Siemens Corporate Research & Technology (2005-13) as Senior Member Technical Staff and as Head of Professional Speech Processing - India (2006-09) and as Professor at PES University, South Campus, Bangalore (2013-2017). He has been with IIIT Bangalore, as Professor, since Feb 2017.
His research interests are: Automatic speech recognition, machine learning, deep learning, few-shot learning, associative memory formulations.
Student Profiles
K. E. Manjunath, Ph.D. [2015 – 2020]
I did my PhD at IIIT Bangalore from Jan 2015 to July 2020. During the course of my PhD, I have worked on "Study of Multilingual Phone Recognition using Indian Languages". Major contributions of my PhD thesis are: 1. Development of "common multilingual phone-set" derived using IPA based transcription of six Indian languages, 2. Development and Analysis of Multilingual Phone Recognition Systems (Multi-PRS) for Indian Languages, 3. Prediction of Articulatory Features (AFs) using DNNs and Multi-task learning framework, 4. Use of Multilingual AFs to improve the performance of Multi-PRS systems, 5. Applications of multilingual phone recognition in code-switching, 6. Comparison of traditional (LID followed by monolingual phone recognition) and "common multilingual phone-set" based approaches for multilingual phone recognition.
Vandana M. Ladwani, Ph.D. (part-time) [2015 - ]
Vandana is a part time PhD student at Speech Lab working under supervision of Dr. V Ramasubramanian. Her research interests are neural networks, associative memory models, multimodal learning etc. She has been working on associative memory formulations since 2015. She has worked on bipolar Hopfield network based associative models for information storage and retrieval of speech, audio, text and multimodal data. Recently she explored M-ary Hopfield neural network for sequence storage and retrieval and came up with models for connectionist temporal sequence decoding with focus on movie scene decoding. Currently she is working on adapting these models for continuous speech decoding.
Vikram. R. Lakkavalli, Ph.D. (part-time) [2017 - ]
I’m working on analysis by synthesis (AbS) paradigm to explore the role of production in perception mechanism since 2017. In this direction, a generic framework is proposed for automatic speech recognition (ASR) in the AbS paradigm. Initial results show promise to reduce error for phone recognition task by using AbS. Experiments have shown that the comparison loop in AbS plays a key role in this setup. We further aim to identify a good intermediate space for comparison. The results have been published in IEEE SPCOM 2022 paper “AbS for ASR: A new computational approach”. In this work it is also intended to understand how data-driven paradigms such as autoencoder framework can be coaxed to explore AbS paradigm for ASR.
Jhelum Das, Ph.D. (full-time) [2018 - ]
I have a keen interest on understanding the human brain activities. So being a Data Science student and having a Computer Science background, I have chosen my research topic to be a Machine learning technique which is influenced by human cognition and Hippocampal integration of the information gathered via different modes of learning like audio, image, text etc. in our brain as a child learns from his or her natural environment. My research topic named “Multimodal Learning based on auto associative memory formulations” is based on Hopfield Neural Networks whose working is simple to understand and is a proven tool to store and retrieve content addressable information. I am trying to solve the question whether it is robust enough to segment, label, store and retrieve a continuous captioned speech data (bimodal data) which is treated as an unsupervised data in our problem, the speech and the caption being treated as a label to each other. In the second part of my work, I will be trying to solve the question, given the perturbed versions (clusters) of the unsupervised captioned speech data, how will the proposed system store and retrieve the original captioned speech data. This is typically an ASR problem, but I also want to investigate the scope of the proposed system for applying it on a general multimodal data.
Dhanya Eledath, Ph.D. (full-time) [2019 - ]
I am a 4th year PhD student at IIITB Bangalore (Jan 2019 to present). My research is in the area of end-to-end Automatic Speech Recognition (E2E-ASR). I am working on a new deep learning paradigm called Few-shot Learning and its adaptation to cross-lingual and multi-lingual E2E-ASR. In my thesis, we adapt a model-based FSL framework ‘Matching Networks’ (MN) to end-to-end (E2E) automatic speech recognition (ASR). Matching Networks belong to the class of ‘embedding learning’ FSL frameworks, where the network ‘learns to compare’ between the few-shot labelled samples (support set) and test sample (query) in an embedded space. Our work is a first-of-its-kind attempt in adapting Metric learning FSL to ASR and we address this problem by proposing a Connectionist Temporal Classification (CTC) loss based end-to-end training of the matching networks (MN) and associated CTC-based decoding of continuous speech. I have completed my comprehensive examination and state-of-the-art presentation. I am currently pursuing a 6-month internship at Samsung R&D Institute India - Bangalore in E2E-ASR.
Shreekantha Nadig, MS [2016 – 2021]
Shreekantha graduated with a thesis titled "Multi-task learning in end-to-end attention-based automatic speech recognition", which studied how external knowledge can be incorporated into the purely data driven Encoder-Attention-Decoder architectures in ASR under a multi-task setting. His thesis studied how a model can leverage information from multiple targets (phoneme and grapheme) to improve the performance on each of them, and how alignment as an additional loss can help the model converge faster and learn a robust alignment model with better performance. Follow Shreekantha's personal web page for more details https://sknadig.dev/
K. M. Srinivasa Raghavan, MS [2017-2022]
Srinivasa Raghavan was awarded his MS by Research degree in July 2022 for his thesis work "Unsupervised representation learning for low resource automatic speech recognition (ASR)". This thesis explored two directions to approach low-resource ASR scenarios. First, a two-stage approach of VQ-VAE based discrete representation learning on unsupervised data and supervised ASR training on the limited labelled data for downstream ASR using the learnt latent representations. Second, a hybrid multi-task approach where both unsupervised and supervised datasets are used within the same architecture to train acoustic models under low-resource setting.
During his stint at IIIT-Bangalore (2017-2022), with the speech group he has co-authored publications in the area of active learning, semi-supervised learning, multi-lingual phone recognition and code-switching problems for low-resource Indian language ASR.
Tirthankar Banerjee, MS [2020-2022], Ph.D. (Part-time) [2023 - ]
Tirthankar’s MS thesis is on “Application of Few-Shot Learning for Speech Recognition”. Few-shot learning (FSL) is an emerging area in machine learning (ML) where the approach to learning is human like in the context of generalizing well even though there is little data available for inferencing on the target task. This thesis work, in a first of its kind attempt, applies the learn to measure (L2M) approach in FSL for Automatic Speech Recognition (ASR) by adapting a specific architecture called Matching Networks (MN). A MN-Encoder-CTC framework was developed for ASR and tested on English and Kannada utterances.
Subsequent to completion of his MS by Research in Dec 2022, Tirthankar is continuing at IIIT Bangalore as a Ph.D. Research Scholar, with his Ph.D. thesis topic focused on “Few-shot Learning for Cross-Domain E2E ASR”