facebook

twitter

youtube

Virtual Tour

Social Profile Links

Professor

v.ramasubramanian@iiitb.ac.in

Education : Ph.D. (TIFR Bombay)

Ramasubramanian obtained his B.S. degree from University of Madras in 1981, B.E. degree from Indian Institute of Science, Bangalore in 1984 and the Ph.D. degree from Tata Institute of Fundamental Research (TIFR), Bombay in 1992. He has been engaged in research in speech processing and related areas for nearly 3 decades. He has over 70 research publications in these areas in peer reviewed international journals and conferences.

He has worked in various institutions and universities, such as TIFR, Bombay (1984-99) as Research Scholar, Fellow and Reader; University of Valencia, Spain as Visiting Scientist (1991-92); Advanced Telecommunications Research (ATR) Laboratories, Kyoto, Japan as Invited Researcher (1996-97); Indian Institute of Science (IISc), Bangalore as Research Associate (2000-04) and Siemens Corporate Research & Technology (2005-13) as Senior Member Technical Staff and as Head of Professional Speech Processing - India (2006-09) and as Professor at PES University, South Campus, Bangalore (2013-2017). He has been with IIIT Bangalore, as Professor, since Feb 2017.

His research interests are: Automatic speech recognition, machine learning, deep learning, few-shot learning, self-supervised learning, associative memory formulations.

Automatic speech recognition, machine learning, deep learning, few-shot learning, self-supervised learning, associative memory formulations

Book / Monograph

  1. V. Ramasubramanian and Harish Doddala, “Ultra low bit-rate speech coding”, Springer-Brief, Springer Verlag NY, 2015.

Journals

  1. K. K. Paliwal and V. Ramasubramanian, “Effect of ordering the codebook on the efficiency of the partial distance search algorithm for vector quantization”, IEEE Transactions on Communications, COM- 37:538–540, May 1989.
  2. V. Ramasubramanian and K. K. Paliwal, “Fast K-d tree algorithms for nearest-neighbor search with application to vector quantization encoding”, IEEE Transactions on Acoustics, Speech and Signal Processing, 40(3):518–531, Mar 1992.
  3. V. Ramasubramanian and K. K. Paliwal, “Fast vector quantization encoding based on K-d tree backtracking search algorithm”, Digital Signal Processing, 7(3):163–187, Jul 1997.
  4. V. Ramasubramanian and K. K. Paliwal, “Fast nearest-neighbor search based on Voronoi projections and its application to vector quantization encoding”, IEEE Transactions on Speech and Audio Processing, 7(2):221–226, Mar 1998.
  5. K. K. Paliwal and V. Ramasubramanian, “Comments on “Modified K-means algorithm for vector quantizer design””, IEEE Transactions on Image Processing, 9(11):1964–1967, Nov 2000.
  6. Manjunath K E, Dinesh Babu Jayagopi, K. Sreenivasa Rao, and V Ramasubramanian, “Development and analysis of multilingual phone recognition systems using Indian languages”, International Journal of Speech Technology, (Springer), pp. 1-12, https://doi.org/10.1007/s10772-018-09589-z, Jan. 2019.

Conferences

  1. V. Ramasubramanian and K. K. Paliwal, “An efficient approximation-elimination algorithm for fast nearest-neighbor search”, In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’92), pages I–89–I–92, San Francisco, California, Mar 1992.
  2. V. Ramasubramanian, A. K. V. Sai Jayram, and T. V. Sreenivas, “Language identification using parallel sub-word recognition – an ergodic HMM equivalence”, In Proc. 8th European Conference on Speech Communication and Technology (EUROSPEECH ’03), pp. 1357–1360, Geneva, Switzerland, Sep 2003.
  3. V. Ramasubramanian, P. Srinivas and T. V. Sreenivas, “Stochastic pronunciation modeling by ergodic-HMM of acoustic sub-word units”, In Proc. 9th European Conference on Speech Communication and Technology (INTERSPEECH - EUROSPEECH ’05), pp. 1361–1364, Lisbon, Portugal, Sep. 2005.
  4. V. Ramasubramanian and D. Harish, “An unified unit-selection framework for ultra low bit-rate speech coding”, In Proc. INTERSPEECH-2006, pp. 217–220, Pittsburgh, Sept 2006.
  5. V. Ramasubramanian and D. Harish, “An optimal unit-selection algorithm for ultra low bit-rate speech coding”, In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’07), pp. IV-541–544, Hawaii, 2007.
  6. V. Ramasubramanian, Kaustubh Kulkarni and Bernhard Kaemmerer, “Acoustic modeling by phoneme templates and one-pass DP decoding for continuous speech recognition”, In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP ’08), pp. 4105–4108, Las Vegas, Mar 2008
  7. D. Harish and V. Ramasubramanian, “Comparison of segment quantizers: VQ, MQ, VLSQ and Unit-selection algorithms for ultra low bit-rate speech coding”, In Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’08), pp. 4773–4776, Las Vegas, Mar 2008.
  8. Srikanth Cherla, Kaustubh Kulkarni, Amit Kale, V. Ramasubramanian, “Towards Fast, View Invariant Human Action Recognition”, In IEEE Workshop for Human Communicative Behavior Analysis at CVPR 2008, Anchorage, Alaska, Aug 2008.
  9. V. Ramasubramanian and D. Harish, “Low complexity near-optimal unit-selection algorithm for ultra low bit-rate speech coding based on N-best lattice and Viterbi decoding”, In Proc. INTERSPEECH-2008, pp. 44, Brisbane, Sep 2008.
  10. Kaustubh Kulkarni, Srikanth Cherla, Amit Kale, V. Ramasubramanian, “A Framework for Indexing Human Actions in Video”, In 1st International Workshop on Machine Learning for Vision-Based Motion Analysis at ECCV 2008, Marseille, France, Oct 2008.
  11. V. Ramasubramanian and D. Harish, “Ultra low bit-rate speech coding based on unit-selection with joint spectral-residual quantization: No transmission of any residual information”, In Proc. INTERSPEECH-2009, pp. 2615-2618, Brighton, UK, Sep 2009.
  12. Srikanth Cherla and V. Ramasubramanian, “Audio analytics by template modeling and 1-pass DP based decoding”, In Proc. INTERSPEECH-2010, pp. 2230-2233, Chiba, Japan, Sep 2010.
  13. V. Ramasubramanian, R. Karthik, S. Thiyagarajan and Srikanth Cherla, “Continuous audio analytics by HMM and Viterbi decoding”, In Proc. ICASSP’ 11, pp. 2396-2399, Prague, Czech Republic, May 2011.
  14. V. Ramasubramanian, S. Thiyagarajan, G. Pradnya, Heiko Claussen, Justinian Rosca, “Two-class verifier framework for audio indexing”, In Proc. ICASSP ’13, Vancouver, Canada, 2013.
  15. Akshay Khatwani, R. Komala Pawar, N. Sushma, L. Sudha, S. Adithya and V. Ramasubramanian, “Spoken document retrieval: Sub-sequence DTW framework and variants”, In 3rd International Conference on Mining Intelligence and Knowledge Exploration (MIKE 2015), Dec 9 - 11, 2015, Published as Springer LNAI proceedings.
  16. S. Adithya, Sunil Rao, C. Mahima, S. Vishnu, Mythri Thippareddy and V. Ramasubramanian, “Template Based Techniques for Automatic Segmentation of TTS Unit Database”, In Proc. IEEE International Conference in Acoustics, Speech and Signal Processing (ICASSP ’16), Shanghai, March 2016.
  17. Mythri Thippareddy, Noor Fathima, D. N. Krishna, Sricharan and V. Ramasubramanian, “Phonetically conditioned prosody transplantation for TTS: 2-stage phone-level unit-selection framework”, In Proc. Speech Prosody ’16, Boston, June 2016.
  18. Y. Vaishnavi, R. Shreyas, S. Suhas, U. N. Surya, Vandana M. Ladwani, V. Ramasubramanian, “Associative memory framework for speech recognition: Adaptation of Hopfield network”, Proc. IEEE INDICON '16, Bangalore, India, 2016.
  19. Vandana M. Ladwani, Y. Vaishnavi, R. Shreyas, B.R. Vinay Kumar, N. Harish, S. Yogesh, P. Shivaganga, V. Ramasubramanian, “Hopfield Net Framework for Audio Search”, In Proc. NCC-2017, IIT-M, Chennai, India, 2017.
  20. Anusha Kamat, Abhishek Krishnamurthy, D. N. Krishna, V. Ramasubramanian, “Prosodic differential for narrow-focus word-stress in speech synthesis”, In Proc. NCC-2017, IIT-M, Chennai, India, 2017.
  21. Vandana M. Ladwani, Y. Vaishnavi, and V. Ramasubramanian, “Hopfield auto-associative memory network for content-based text-retrieval”, Proc. ICANN-2017, 26th International Conference on Artificial Neural Networks, Alghero, Italy, Sep 11-14, 2017.
  22. M. Chellapriyadharshini, Anoop Toffy, V Ramasubramanian, “Semi-supervised and active learning scenarios: Efficient acoustic model refinement for a low resource Indian language,” Proc. Interspeech ’18, Hyderabad, Sep 2018.
  23. Manjunath K E, K. Sreenivasa Rao, Dinesh Babu Jayagopi, and V Ramasubramanian, “Indian languages ASR: A multilingual phone recognition framework with IPA based common phone-set, predicted articulatory features and feature fusion”, Proc. Interspeech '18, Hyderabad, Sep 2018.
  24. Rachna Shriwas, Prasun Joshi, Vandana M. Ladwani, and V. Ramasubramanian, “Multi-modal associative storage and retrieval using Hopfield auto-associative memory network”, ICANN-2019, 28th International Conference on Artificial Neural Networks, Munich, Germany, Sep 17-19, 2019
  25. Kaajal Gupta, Tilak Purohit, Anzar Zulfiqar, Pushpa Ramu, V. Ramasubramanian, “Detection of emotional states of OCD patients in an exposure-response prevention therapy scenario”, in the ‘Speech, Music and Mind 2019 (SMM-2019) Workshop- Detecting and Influencing Mental States with Audio’, Vienna, Austria, Satellite Workshop of Interspeech-2019, Graz, Sep 2019.
  26. T. Vijaya Kumar, R. Shunmuga Sundar, Tilak Purohit, V. Ramasubramanian, “End-to-end audio-scene classification from raw audio: Multi time-frequency resolution CNN architecture for efficient representation learning, nternational Conference on Signal Processing and Communications (SPCOM-2020), July 2020, IISc, Bangalore, (BEST PAPER AWARD).
  27. Shreekantha Nadig, Sumit Chakraborty, Anuj Shah, Chaitanay Sharma, V. Ramasubramanian, Sachit Rao, "Jointly learning to align and transcribe using attention-based alignment and uncertainty-to-weigh losses", International Conference on Signal Processing and Communications (SPCOM-2020), July 2020, IISc, Bangalore (Best Student Paper Award Honorable Mention).
  28. Abhijith Madan, Ayush Khopkar, Shreekantha Nadig, K. M. Srinivasa Raghavan, Dhanya Eledath, V. Ramasubramanian, "Semi-supervised learning for acoustic model retraining: Handling speech data with noisy transcript", International Conference on Signal Processing and Communications (SPCOM-2020), July 2020, IISc, Bangalore.
  29. Shreekantha Nadig, V. Ramasubramanian, Sachit Rao, "Multi-target hybrid CTC-Attentional Decoder for joint phoneme-grapheme recognition", International Conference on Signal Processing and Communications (SPCOM-2020), July 2020, IISc, Bangalore.
  30. Tilak Purohit and V. Ramasubramanian, “Component-specific temporal decomposition: application to enhanced speech coding and co-articulation analysis”, International Conference on Signal Processing and Communications (SPCOM-2020), July 2020, IISc, Bangalore.
  31. Manjunath K E, D. B. Jayagopi, K. S. Rao, and V. Ramasubramanian, “Articulatory Feature based Methods for Performance Improvement of Multilingual Phone Recognition Systems using Indian Languages,” Sadhana (Springer), Indian Academy of Science, 45:190, 2020, doi:10.1007/s12046-020-01428-9.
  32. Manjunath K E, K. M. Srinivasaraghavan, K. S. Rao, D. B. Jayagopi, and V. Ramasubramanian, “Approaches for Multilingual Phone Recognition in Code-Switched and Non-Code-Switched Scenarios using Indian Languages,” ACM Transactions on Asian and Low-Resource Language Information Processing (ACM TALLIP), Vol. 20, Issue 4, pp 1-19, 2021, doi: 10.1145/3437256.
  33. Tirthankar Banerjee [MS2020016], Narasimha Rao Thurlapati [SMT2018013], V. Pavithra [SMT2018015], S. Mahalakshmi [SMT2018010], Dhanya Eledath [PH2018022] and V. Ramasubramanian, “Few-shot learning for frame-wise phoneme recognition: Adaptation of matching networks”, in 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, Aug 2021.
  34. Dhanya Eledath [PH2018022], P. Inbarajan [SMT2017005], Anurag Biradar [SMT2018004], Sathwick Mahadeva [SMT2018022], V. Ramasubramanian, “End-to-end speech recognition from raw speech: Multi time-frequency resolution CNN architecture for efficient representation learning”, in 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, Aug 2021.
  35. Dhanya Eledath [PH2018022], V. Pavithra [SMT2018015], Narasimha Rao Thurlapati [SMT2018013], Tirthankar Banerjee [MS2020016], and V. Ramasubramanian, “Few-shot learning for cross-lingual end-to-end speech recognition”, in Workshop on Machine Learning in Speech and Language Processing 2021 (MLSLP 2021), Satellite Workshop of Interspeech 2021, Brno, Czech Republic, Sep 2021.
  36. Vandana M. Ladwani [PH2015013] and V. Ramasubramanian, “M-ary Hopfield Neural Network Based Associative Memory Formulation: Limit-Cycle Based Sequence Storage and Retrieval”, in Proc. ICANN 2021 (Virtual), Springer Nature Switzerland AG 2021, I. Farkas et al. (Eds.): ICANN 2021, LNCS 12894, pp. 1–13, 2021. https://doi.org/10.1007/978-3-030-86380-7_34
  37. Tirthankar Banerjee, Dhanya Eledath, and V Ramasubramanian, “Few shot learning for cross-lingual isolated word recognition,” in The First International Conference on AI-ML-Systems (AIML Systems 2021), Bangalore, India, Oct 2021.
  38. Vandana M. Ladwani [PH2015013] and V. Ramasubramanian, “Harnessing Energy of M-ary Hopfield Neural Network for Connectionist Temporal Sequence Decoding”, 8th International Conference on Mining Intelligence & Knowledge Exploration (MIKE 2021), 1-3 November 2021, Hammamet, Tunisia (2nd BEST PAPER AWARD).
  39. Dhanya Eledath, Narasimha Rao Thurlapati, V Pavithra, Tirthankar Banerjee and V. Ramasubramanian, “Few-shot learning for end-to-end speech recognition: architectural variants for support set generation and optimization,” Proc. EUSIPCO 2022, Serbia, Sep 2022.
  40. Vandana M. Ladwani and V. Ramasubramanian , “M-ary Hopfield neural network for storage and retrieval of variable length sequences: Multi-limit cycle approach”, IEEE SSCI 2022, Singapore, 2022.

Speech Processing (DS / NC 822) Jan-Apr 2018

Automatic Speech Recognition (ASR) (DS / NE 821) Jan-Apr 2017, (DS / NC 824) Aug-Dec 2018, (DS / SP 823) Jan-Apr 2019, Jan-Apr 2020

Deep Learning for Automatic Speech Recognition (DL-ASR) (DS / NC 871) Aug-Dec 2017, (DS / SP 826) Aug-Dec 2019, (AI 826) Aug-Dec 2020

Linear Algebra (GEN 504) Aug-Dec 2017

Machine Learning (CS / DS 612) Jan-Apr 2018

Maths for Machine Learning (GEN 611) Jan-Apr 2018, (GEN 512) Aug-Dec 2018, (GEN 512) Aug-Dec 2019, Nov-Dec 2020

Few-Shot Learning (AI 831) Jan-Apr 2022, Jan-Apr 2023

Few-Shot Learning - 2 (AI 834) Aug-Dec 2022

Self-supervised Learning (AI 835) Aug-Dec 2023

Research (MS/PhD Students, Topics, Publications) 

My research spans topics in "Automatic Speech Recognition (ASR) and related areas" as forming the different thesis of my MS/PhD students (2015 - now) listed below.

Research students (completed)
  1. K. E. Manjuanth, 2020, Ph.D. Thesis, "Study of Multilingual Phone Recognition Using Indian Languages'", Prof. Ravi Ravindran Gold Medal for Best Ph.D. Thesis (2020).
  2. Shreekantha Nadig, 2021, MS Thesis, "Multi-task Learning in End-to-end Attention-based Automatic Speech Recognition".
  3. K. M. Srinivasa Raghavan, 2022, MS Thesis, "Unsupervised Representation Learning for Low-resource Automatic Speech Recognition".
  4. Tirthankar Banerjee, 2022, MS Thesis, "Application of Few-Shot Learning for Speech Recognition".
Research students (ongoing)
  1. Ms. Vandana Ladwani, 2015-now, (Ph.D.), "Hopfield auto-associative memory formulations for information storage, retrieval and sequence decoding".
  2. Vikram R. Lakkavalli, 2017-now, (Ph.D.), "Analysis-by-synthesis (AbS) frameworks for ASR".
  3. Ms. Dhanya Eledath, 2018-now, (Ph.D.), "Few-Shot Learning for E2E Automatic Speech Recognition".
  4. Ms. Jhelum Das, 2018-now, (Ph.D.), "Auto-associative memory formulations for multi-modal learning".
  5. Tirthankar Banerjee, 2023- (Ph.D.), "Cross-domain Few-shot Learning for ASR"

Following short-profiles of my MS/Ph.D. research students give details of the above research topics.

K. E. Manjunath, Ph.D. [2015 – 2020]

I did my PhD at IIIT Bangalore from Jan 2015 to July 2020. During the course of my PhD, I have worked on "Study of Multilingual Phone Recognition using Indian Languages". Major contributions of my PhD thesis are: 1. Development of "common multilingual phone-set" derived using IPA based transcription of six Indian languages, 2. Development and Analysis of Multilingual Phone Recognition Systems (Multi-PRS) for Indian Languages, 3. Prediction of Articulatory Features (AFs) using DNNs and Multi-task learning framework, 4. Use of Multilingual AFs to improve the performance of Multi-PRS systems, 5. Applications of multilingual phone recognition in code-switching, 6. Comparison of traditional (LID followed by monolingual phone recognition) and  "common multilingual phone-set"  based approaches for multilingual phone recognition.

Vandana M. Ladwani, Ph.D. (part-time) [2015 - ]

Vandana is a part time PhD student at Speech Lab working under supervision of Dr. V Ramasubramanian. Her research interests are neural networks, associative memory models, multimodal learning etc. She has been working on associative memory formulations since 2015. She has worked on bipolar Hopfield network based associative models for information storage and retrieval of speech, audio, text and multimodal data. Recently she explored M-ary Hopfield neural network for sequence storage and retrieval and came up with models for connectionist temporal sequence decoding with focus on movie scene decoding. Currently she is working on adapting these models for continuous speech decoding.

Vikram. R. Lakkavalli, Ph.D. (part-time) [2017 - ]

I’m working on analysis by synthesis (AbS) paradigm to explore the role of production in perception mechanism since 2017. In this direction, a generic framework is proposed for automatic speech recognition (ASR) in the AbS paradigm. Initial results show promise to reduce error for phone recognition task by using AbS. Experiments have shown that the comparison loop in AbS plays a key role in this setup. We further aim to identify a good intermediate space for comparison. The results have been published in IEEE SPCOM 2022 paper “AbS for ASR: A new computational approach”. In this work it is also intended to understand how data-driven paradigms such as autoencoder framework can be coaxed to explore AbS paradigm for ASR.

Jhelum Das, Ph.D. (full-time) [2018 - ]

I have a keen interest on understanding the human brain activities. So being a Data Science student and having a Computer Science background, I have chosen my research topic to be a Machine learning technique which is influenced by human cognition and Hippocampal integration of the information gathered via different modes of learning like audio, image, text etc. in our brain as a child learns from his or her natural environment. My research topic named “Multimodal Learning based on auto associative memory formulations” is based on Hopfield Neural Networks whose working is simple to understand and is a proven tool to store and retrieve content addressable information. I am trying to solve the question whether it is robust enough to segment, label, store and retrieve a continuous captioned speech data (bimodal data) which is treated as an unsupervised data in our problem, the speech and the caption being treated as a label to each other. In the second part of my work, I will be trying to solve the question, given the perturbed versions (clusters) of the unsupervised captioned speech data, how will the proposed system store and retrieve the original captioned speech data. This is typically an ASR problem, but I also want to investigate the scope of the proposed system for applying it on a general multimodal data.

Dhanya Eledath, Ph.D. (full-time) [2019 - ]

I am a 4th year PhD student at IIITB Bangalore (Jan 2019 to present). My research is in the area of end-to-end Automatic Speech Recognition (E2E-ASR). I am working on a new deep learning paradigm called Few-shot Learning and its adaptation to cross-lingual and multi-lingual E2E-ASR. In my thesis, we adapt a model-based FSL framework ‘Matching Networks’ (MN) to end-to-end (E2E) automatic speech recognition (ASR). Matching Networks belong to the class of ‘embedding learning’ FSL frameworks, where the network ‘learns to compare’ between the few-shot labelled samples (support set) and test sample (query) in an embedded space. Our work is a first-of-its-kind attempt in adapting Metric learning FSL to ASR and we address this problem by proposing a Connectionist Temporal Classification (CTC) loss based end-to-end training of the matching networks (MN) and associated CTC-based decoding of continuous speech. I have completed my comprehensive examination and state-of-the-art presentation. I am currently pursuing a 6-month internship at Samsung R&D Institute India - Bangalore in E2E-ASR.

Shreekantha Nadig, MS [2016 – 2021]

Shreekantha graduated with a thesis titled "Multi-task learning in end-to-end attention-based automatic speech recognition", which studied how external knowledge can be incorporated into the purely data driven Encoder-Attention-Decoder architectures in ASR under a multi-task setting. His thesis studied how a model can leverage information from multiple targets (phoneme and grapheme) to improve the performance on each of them, and how alignment as an additional loss can help the model converge faster and learn a robust alignment model with better performance. Follow Shreekantha's personal web page for more details https://sknadig.dev/

K. M. Srinivasa Raghavan, MS [2017-2022]

Srinivasa Raghavan was awarded his MS by Research degree in July 2022 for his thesis work "Unsupervised representation learning for low resource automatic speech recognition (ASR)". This thesis explored two directions to approach low-resource ASR scenarios. First, a two-stage approach of VQ-VAE based discrete representation learning on unsupervised data and supervised ASR training on the limited labelled data for downstream ASR using the learnt latent representations. Second, a hybrid multi-task approach where both unsupervised and supervised datasets are used within the same architecture to train acoustic models under low-resource setting.

During his stint at IIIT-Bangalore (2017-2022), with the speech group he has co-authored publications in the area of active learning, semi-supervised learning, multi-lingual phone recognition and code-switching problems for low-resource Indian language ASR.

Tirthankar Banerjee, MS [2020-2022]

Tirthankar’s MS thesis is on “Application of Few-Shot Learning for Speech Recognition”. Few-shot learning (FSL) is an emerging area in machine learning (ML) where the approach to learning is human like in the context of generalizing well even though there is little data available for inferencing on the target task. This thesis work, in a first of its kind attempt, applies the learn to measure (L2M) approach in FSL for Automatic Speech Recognition (ASR) by adapting a specific architecture called Matching Networks (MN). A MN-Encoder-CTC framework was developed for ASR and tested on English and Kannada utterances.

Publications of MS/Ph.D. students

  1. Manjunath K E, “Multilingual Phone Recognition in Indian Languages”, Springer Briefs in Speech Technology, 2021, eBook ISBN 978-3-030-80741-2, doi: 10.1007/978-3-030-80741-2
  2. Manjunath K E, K. M. S. Raghavan, K. S. Rao, D. B. Jayagopi, and V. Ramasubramanian, “Approaches for Multilingual Phone Recognition in Code-Switched and Non-Code-Switched Scenarios using Indian Languages,” ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Vol. 20(4), pp. 1-19, 2021. doi:  10.1145/3437256
  3. Manjunath K E, D. B. Jayagopi, K. S. Rao, and V. Ramasubramanian, “Articulatory Feature based Methods for Performance Improvement of Multilingual Phone Recognition Systems using Indian Languages,” Sadhana (Springer), Vol. 45(1), 2020. doi: 10.1007/s12046-020-01428-9
  4. Manjunath K E, D. B. Jayagopi, K. S. Rao, and V. Ramasubramanian, “Development and analysis of multilingual phone recognition systems using Indian languages,” International Journal of Speech Technology, (Springer), pp. 1-12, 2019. doi: 10.1007/s10772-018-09589-z
  5. Manjunath K E, K. M. S. Raghavan, K. S. Rao, D. B. Jayagopi, and V. Ramasubramanian, “Multilingual Phone Recognition: Comparison of Traditional versus Common Multilin-gual Phone-set Approaches and Applications in Code-Switching,” International Symposium on Signal Processing and Intelligent Recognition Systems, Dec. 2019. doi:10.1007/978-981-15-4828-4 7
  6. Manjunath K E, K. S. Rao, D. B. Jayagopi, and V. Ramasubramanian, “Indian languages ASR: A multilingual phone recognition framework with IPA based common phone-set, predicted articulatory features and feature fusion,” INTERSPEECH, Sept. 2018, pp. 1016-1020. doi: 10.21437/Interspeech.2018-2529
  7. Manjunath K E, “Study of Multilingual Phone Recognition using Indian Languages,” 4th Doctoral Consortium at INTERSPEECH, 2018.
  8. Manjunath K E, K. S. Rao and D. B. Jayagopi, ”Development of multilingual phone recognition system for Indian languages,” IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES), Aug. 2017, pp. 1-6. doi: 10.1109/SPICES.2017.8091271
  9. Y. Vaishnavi, R. Shreyas, S. Suhas, U. N. Surya, V. M. Ladwani, and V. Ramasubramanian, “Associative memory framework for speech recognition: Adaptation of hopfield network”, In Proceedings of INDICON-2016, IEEE Annual India Conference, pages 1-6, 2016.
  10. Vandana M. Ladwani, Y. Vaishnavi, R. Shreyas, B.R. Vinay Kumar, N. Harisha, S. Yogesh, P. Shivaganga, V. Ramasubramanian, “Hopfield net framework for audio search”, In Proceedings of NCC-2017, IIT-Madras, Chennai, India, 2017.
  11. Vandana M. Ladwani, Y. Vaishnavi, and V. Ramasubramanian, “Hopfield auto-associative memory network for content-based text-retrieval”, In Proceedings of ICANN-2017, 26th International Conference on Artificial Neural Networks, Alghero, Italy, Sep 2017
  12. Rachna Shriwas, Prasun Joshi, Vandana M. Ladwani, and V. Ramasubramanian, “Multi-modal associative storage and retrieval using Hopfield auto-associative memory network”, In Proceedings of ICANN-2019, 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019
  13. Vandana M. Ladwani, and V. Ramasubramanian, “M-ary Hopfield Neural Network Based Associative Memory Formulation: Limit-Cycle Based Sequence Storage and Retrieval”, In Proceedings of ICANN-2021, 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021
  14. Vandana M. Ladwani and V. Ramasubramanian, “Harnessing Energy of M-ary Hopfield Neural Network for Connectionist Temporal Sequence Decoding”, In Proceedings of MIKE 2021, Hammamet, Tunisia, 1-3, November 2021 [2nd Best paper award].
  15. Vandana M. Ladwani and V. Ramasubramanian, “M-ary Hopfield neural network for storage and retrieval of variable length sequences: Multi-limit cycle approach”, accepted in IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE FOR HUMAN-LIKE INTELLIGENCE (IEEE CIHLI) 2022, Singapore,4-7, December 2022
  16. Vikram. R. Lakkavalli, "AbS for ASR: A New Computational Perspective", IEEE International Conference on Signal Processing and Communications (SPCOM), Bangalore, India, 2022, doi: 10.1109/SPCOM55316.2022.9840830.
  17. Dhanya Eledath, Narasimha Rao Thurlapati, V Pavithra, Tirthankar Banerjee and and V. Ramasubramanian, “Few-shot learning for end-to-end speech recognition: architectural variants for support set generation and optimization,” In EUSIPCO 2022, Serbia, Belgrade, Aug 2022.
  18. Dhanya Eledath, V Pavithra, Narasimha Rao Thurlapati, Tirthankar Banerjee, and V Ramasubramanian, “Few-shot learning for cross-lingual end-to-end speech recognition,” in Workshop on Machine Learning in Speech and Language Processing 2021 (MLSLP 2021), Satellite Workshop of Interspeech 2021, Brno, Czech Republic, Sep 2021.
  19. Dhanya Eledath, P. Inbarajan, Anurag Biradar, Sathwick Mahadeva and V. Ramasubramanian, “End-to-end speech recognition from raw speech: Multi time-frequency resolution CNN architecture for efficient representation learning,” in 2021 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, Aug 2021.
  20. Abhijith Madan, Ayush Khopkar, Shreekantha Nadig, K. M. Srinivasa Raghavan, Dhanya Eledath, and V. Ramasubramanian, “Semi-supervised learning for acoustic model retraining: Handling speech data with noisy transcript,” in 2020 International Conference on Signal Processing and Communications (SPCOM), Bangalore, India, July 2020.
  21. Shreekantha Nadig, V. Ramasubramanian, Sachit Rao, “Multi-target hybrid CTC-Attentional Decoder for joint phoneme-grapheme recognition”, International Conference on Signal Processing and Communications (SPCOM), Bangalore, India, 2020.
  22. Shreekantha Nadig, Sumit Chakraborty, Anuj Shah, Chaitanay Sharma, V. Ramasubramanian, Sachit Rao, ”Jointly learning to align and transcribe using attention-based alignment and uncertainty-to-weigh losses”, International Conferenceon Signal Processing and Communications (SPCOM), Bangalore, India, 2020 (Best Student Paper Award – Honorable Mention).
  23. K. M. Srinivasa Raghavan, S. Kumar, “Hybrid Unsupervised and Supervised Multitask Learning For Speech Recognition in Low Resource Languages”, Workshop on Machine Learning in Speech and Language Processing 2021 (MLSLP-2021), Satellite Workshop of Interspeech-2021, Brno, Czechia, Sep. 2021 (https://homepages.inf.ed.ac.uk/htang2/sigml/mlslp2021/MLSLP2021_paper_8.pdf
  24. M. Chellapriyadharshini, A. Toffy, K. M. Srinivasa Raghavan, V. Ramasubramanian, "Semi-supervised and Active-learning Scenarios: Efficient Acoustic Model Refinement for a Low Resource Indian Language" Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 Sep. 2018, pp. 1041-1045, doi: https://doi.org/10.21437/Interspeech.2018-2486
  25. Tirthankar Banerjee, Narasimha Rao Thurlapati, V Pavithra, S Mahalakshmi, Dhanya Eledath, and V Ramasubramanian, “Few-shot learning for frame-wise phoneme recognition: Adaptation of matching networks”, In 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, pages 516-520, August 2021. doi:https://doi.org/10.23919/eusipco54536.2021.9616234.
  26. Tirthankar Banerjee, Dhanya Eledath, and V Ramasubramanian, “Few shot learning for cross-lingual isolated word recognition”, In The First International Conference on AI-ML-Systems, Bangalore, India, September 2021. doi:https://doi.org/10.1145/3486001.3486235.
  27. Vandana M. Ladwani [PH2015013] and V. Ramasubramanian, “Harnessing Energy of M-ary Hopfield Neural Network for Connectionist Temporal Sequence Decoding”, 8th International Conference on Mining Intelligence & Knowledge Exploration (MIKE 2021), 1-3 November 2021, Hammamet, Tunisia (2nd BEST PAPER AWARD).
  28. Dhanya Eledath, Narasimha Rao Thurlapati, V Pavithra, Tirthankar Banerjee and V. Ramasubramanian, “Few-shot learning for end-to-end speech recognition: architectural variants for support set generation and optimization,” Proc. EUSIPCO 2022, Serbia, Sep 2022.
  29. Vandana M. Ladwani and V. Ramasubramanian , “M-ary Hopfield neural network for storage and retrieval of variable length sequences: Multi-limit cycle approach”, IEEE SSCI 2022, Singapore, 2022.
Tutorials (@ ICASSP, Toulouse, France)
  • V. Ramasubramanian and Amitav Das. Text-dependent speaker-recognition: A survey and state-of-the-art. Tutorials at IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP' 06), Toulouse, France, May 2006
Tutorials (@ WiSSAP, other venues, India)
  1. V. Ramasubramanian. Automatic speech recognition and understanding: An overview. Winter School on Speech and Audio Processing (WiSSAP '06), Indian Institute of Science (IISc), Bangalore, Jan 2006
  2. V. Ramasubramanian. Low and ultra low bit-rate speech coding. Winter School on Speech and Audio Processing (WiSSAP '07), Indian Institute of Science (IISc), Bangalore, Jan 2007.
  3. V. Ramasubramanian. Articulatory speech synthesis: Basics and some recent advances. Winter School on Speech and Audio Processing (WiSSAP '08), Indian Institute of Technology - Madras (IIT-M), Chennai, Jan 2008.
  4. V. Ramasubramanian. Speaker recognition: An overview. Winter School on Speech and Audio Processing (WiSSAP '09), Indian Institute of Technology - Kanpur (IIT-K), Jan 2009.
  5. V. Ramasubramanian. Computational auditory scene analysis (CASA): An introduction. Winter School on Speech and Audio Processing (WiSSAP '12), Indian Institute of Science (IISc), Bangalore, Jan 2012.
  6. V. Ramasubramanian. Perception-production link: Classical perspectives. Winter School on Speech and Audio Processing (WiSSAP '15), DAIICT, Gandhinagar, Ahmedabad, Jan 2015.
  7. V. Ramasubramanian. Computational Models of Prosody for TTS. Winter School on Speech and Audio Processing (WiSSAP '16), SSNCE, Chennai, Jan 2016.
  8. V. Ramasubramanian. Voice Biometrics. Second International Conference on Cognitive Computing and Information Processing (CCIP - 2016), SJCE, Mysore, August, 2016
Invited talks
  1. V. Ramasubramanian, "Methods, performance and trends in speaker identification and verification: A review and critique". In Workshop on speaker-verification and identification, Indian Statistical Institute, Kolkata, Jan. 2005.
  2. V. Ramasubramanian, "Ultra low bit-rate speech coding: Segment vocoders'", Center for Artificial Intelligence and Robotics (CAIR), DRDO, Bangalore, Sep. 2005.
  3. V. Ramasubramanian, "Speech synthesis: Some perspectives", Key-note talk, National Workshop on Speech Synthesis, All India Institute of Speech and Hearing, Mysore, Jan 12-13, 2006.
  4. V. Ramasubramanian, "Ultra low bit-rate speech coding: Segment vocoders", International Institute of Information Technology Bangalore (IIITB), Bangalore, March 2006.
  5. V. Ramasubramanian, "Speech and speaker recognition", National workshop on Advanced Signal Processing Technologies, BNM Institute of Technology, Bangalore, Apr 2006.
  6. V. Ramasubramanian, "Spoken language systems", National workshop on artificial intelligence, Center for Development for Advanced Computing (C-DAC), Mumbai, July 2006
  7. V. Ramasubramanian, "Speech research at Siemens corporate technology - India", Bharti Centre for Communications Inauguration Workshop, Indian Institute of Technology - Bombay, Mumbai, Jan 2009.
  8. V. Ramasubramanian, "Forensic automatic speaker recognition", Post-graduate diploma course, All India Institute of Speech and Hearing (AIISH), Mysore, March 2009.
  9. V. Ramasubramanian, "Speaker-recognition: An overview", Valedictory Talk, Summer Course on Image Processing and Pattern Recognition, PES School of Engineering, Bangalore, July 2009.
  10. V. Ramasubramanian, "Ultra low bit-rate speech coding", Center for Artificial Intelligence and Robotics (CAIR), DRDO, Bangalore, Jan 2012.
  11. V. Ramasubramanian, "Ultra low bit-rate speech coding: An overview and recent results", IEEE International Conference on Signal Processing and Communication (SPCOM), Bangalore, 2012.
  12. V. Ramasubramanian, "Audio analytics", Symposium on Multi-media Information Archival and Retrieval Systems, PESIT North Campus, Nov 2013.
  13. V. Ramasubramanian, "Audio analytics", Internship Program in Technology Supported Education (IPTSE), CMU-MSRIT Winter School, MSRIT, Bangalore, Dec 2013.
  14. V. Ramasubramanian, "Concatenative articulatory synthesis", Workshop on text-to-speech synthesis, DAIICT, Gandhinagar, June 2014.
  15. V. Ramasubramanian, "Importance of results from speech and hearing sciences on speech technology research", Chief Guest lecture, International seminar on `Research in Hearing Sciences', AIISH, Mysore, Dec 2015.
  16. V. Ramasubramanian, "Audio analytics", Workshop on Recent Trends in Video Analytics, MSRIT, Bangalore, Jan 2016.
  17. V. Ramasubramanian, "Prosody Transplantation for TTS using Phone-Level and Segment-Level Unit Selection", Summer School on Advances in Speech and Audio Processing (ASAP) 2016, DAIICT, Gandhinagar, July 2016.
  18. V. Ramasubramanian, "Fujisaki Parameter Estimation: A `Direct Search' Formulation", Summer School on Advances in Speech and Audio Processing (ASAP) 2016, DAIICT, Gandhinagar, July 2016.
Other professional activities
  • Examiner for thesis: M.S., Ph.D. (IIT-Madras) 2008, 2009, 2012, 2013, 2016, 2018, M.S., M.Tech (IISc, Bangalore), 2005, 2008, 2009, Ph.D. (AIISH, Mysore).
  • Invited reviewer for journals and conferences:
  • IEEE Transactions on Audio, Speech and Language Processing (ASLP)
  • IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)
  • IEEE Transactions on Image Processing (IP)
  • IEEE Signal Processing Letters (SPL)
  • Journal of the Acoustical Society of America (JASA)
  • Speech communication
  • EURASIP Journal on Audio, Speech, and Music Processing (ASMP)
  • Pattern Recognition Letters
  • Journal of Circuits, Systems and Signal Processing
  • Sadhana (Indian Academy of Sciences)
  • ICASSP, Interspeech, EUSIPCO, IEEE SPCOM, IEEE ICC, IEEE CONECCT, AIML Systems Conference.
  • Organization Committee Member, Interspeech - 2018, Hyderabad, India; Tutorials Committee Chair; Organized 8 tutorials as part of Interspeech-2018.