Skip to content
- K. Zhou, B. Sisman, R. Rana, B. W. Schuller, and H. Li, “Emotion Intensity and its Control for Emotional Voice Conversion,” IEEE Transactions on Affective Computing, 1-18, 2022.
- R. Liu, B. Sisman, G. Gao and H. Li, “Decoding Knowledge Transfer for Neural Text-to-Speech Training,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022.
- B. Sisman, J. Yamagishi, S. King and H. Li, “An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 132-157, 2021, doi: 10.1109/TASLP.2020.3038524.
- K. Zhou, B. Sisman, R. Liu, H. Li, “Emotional Voice Conversion: Theory, Databases and ESD” Speech Communication, 2021.
- R. Liu, B. Sisman, F. Bao, J. Yang, G. Gao and H. Li, “Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 274-285, 2021, doi: 10.1109/TASLP.2020.3040523.
- R. Liu, B. Sisman, Y. Lin, H. Li ‘FastTalker: A Neural Text-to-Speech Architecture with Shallow and Group Autoregression’ Neural Networks, 141, 306-314, 2021.
- R. Liu, B. Sisman, G. Gao and H. Li, “Expressive TTS Training With Frame and Style Reconstruction Loss,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1806-1818, 2021, doi: 10.1109/TASLP.2021.3076369
- R. Liu, B. Sisman, F. Bao, G. Gao and H. Li, “Modeling Prosodic Phrasing With Multi-Task Learning in Tacotron-Based TTS,” in IEEE Signal Processing Letters, vol. 27, pp. 1470-1474, 2020, doi: 10.1109/LSP.2020.3016564.
- M. Zhang, B. Sisman, H.Li, ‘DBLSTM-based Voice Conversion with WaveNet Vocoder for Limited Parallel Data’ Speech Communication 122 (2020): 31-43.
- B. Sisman, M. Zhang and H. Li, “Group Sparse Representation With WaveNet Vocoder Adaptation for Spectrum and Prosody Conversion,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 6, pp. 1085-1097, June 2019, doi: 10.1109/TASLP.2019.2910637.
- Liu, R., Sisman, B., Schuller, B., Gao, G., Li, H. Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning. Proc. Interspeech 2022, 5493-5497.
- Du, Z., Sisman, B., Zhou, K., Li, H. Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion. Proc. Interspeech 2022, 2603-2607.
- Lam, P., Zhang, H., Chen, N., Sisman, B. EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models. Proc. Interspeech 2022, 823-827.
- Lu J, Sisman B, Liu R, et al. VisualTTS: TTS with accurate lip-speech synchronization for automatic voice over[C]//ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022: 8032-8036.
- Z. Du, B. Sisman, K. Zhou, H. Li ‘Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer’ ASRU 2021 – IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
- S. Nikonorov, B. Sisman M. Zhang, H. Li ‘DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding’ ASRU 2021 – IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2021.
- K. Zhou, B. Sisman, R. Liu, H. Li ‘Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training’ INTERSPEECH 2021.
- R. Liu, B. Sisman, H. Li ‘Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability’ INTERSPEECH 2021.
- K. Zhou, B. Sisman, R. Liu and H. Li, “Seen and Unseen Emotional Style Transfer for Voice Conversion with A New Emotional Speech Dataset,” ICASSP 2021 – 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 920-924, doi: 10.1109/ICASSP39728.2021.9413391.
- R. Liu, B. Sisman and H. Li, “Graphspeech: Syntax-Aware Graph Attention Network for Neural Speech Synthesis,” ICASSP 2021 – 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6059-6063, doi: 10.1109/ICASSP39728.2021.9413513.
- K. Zhou, B. Sisman and H. Li, “Vaw-Gan For Disentanglement And Recomposition Of Emotional Elements In Speech,” 2021 IEEE Spoken Language Technology Workshop (SLT), 2021, pp. 415-422, doi: 10.1109/SLT48900.2021.9383526.
- B. Sisman, J. Li, F. Bao, G. Gao and H. Li, “Teacher-Student Training For Robust Tacotron-Based TTS,” ICASSP 2020 – 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 6274-6278, doi: 10.1109/ICASSP40776.2020.9054681.
- Du Z, Zhou K, Sisman B, Li H. Spectrum and prosody conversion for cross-lingual voice conversion with cyclegan. In 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2020 Dec 7 (pp. 507-513). IEEE.
- Liu R, Sisman B, Bao F, et al. WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss. Odyssey 2020 The Speaker and Language Recognition Workshop. 2020: 245-251.
- Zhang M, Sisman B, Zhao L, et al. Deepconversion: Voice conversion with limited parallel training data[J]. Speech Communication, 2020, 122: 31-43.
- Sisman B, Li H. Generative Adversarial Networks for Singing Voice Conversion with and without Parallel Data. In Odyssey 2020 (pp. 238-244).
- K. Zhou, B. Sisman, M. Zhang, H. Li, “Converting Anyone’s Emotion: Towards Speaker-Independent Emotional Voice Conversion”, INTERSPEECH 2020.
- K. Zhou, B. Sisman, H. Li, “Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-parallel Data”, Speaker Odyssey 2020.
- B. Sisman, M. Zhang, M. Dong and H. Li, “On the Study of Generative Adversarial Networks for Cross-Lingual Voice Conversion,” 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019, pp. 144-151, doi: 10.1109/ASRU46091.2019.9003939.
- B. Sisman, K. Vijayan, M. Dong and H. Li, “SINGAN: Singing Voice Conversion with Generative Adversarial Networks,” 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2019, pp. 112-118, doi: 10.1109/APSIPAASC47483.2019.9023162.
- Tjandra, B. Sisman, M. Zhang, S. Sakriani, H. Li, S. Nakamura, ‘VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019’ INTERSPEECH 2019.
- B. Sisman, M. Zhang and H. Li, “Group Sparse Representation With WaveNet Vocoder Adaptation for Spectrum and Prosody Conversion,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 6, pp. 1085-1097, June 2019, doi: 10.1109/TASLP.2019.2910637.
- Sisman, M. Zhang, S. Sakti, H. Li and S. Nakamura, “Adaptive Wavenet Vocoder for Residual Compensation in GAN-Based Voice Conversion,” 2018 IEEE Spoken Language Technology Workshop (SLT), 2018, pp. 282-289, doi: 10.1109/SLT.2018.8639507.
- Sisman, H. Li, ‘Limited Data Voice Conversion from Sparse Representation to GANs and WaveNet’, APSIPA ASC 2018 PhD Forum, Hawaii, Honolulu, United States [Best Presentation Award]
- Sisman, M. Zhang, H. Li, ‘A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder’, INTERSPEECH 2018, India
- B. Sisman, H. Li, ‘Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion’, INTERSPEECH 2018
- B. Sisman, G. Lee, H. Li, ‘Phonetically Aware Exemplar-Based Prosody Transformation’, Speaker Odyssey, France, 2018
- M. Zhang, B. Sisman, S. S. Rallabandi, H. Li, L. Zhao, ‘Error Reduction Network for DBLSTM-based Voice Conversion’, APSIPA ASC 2018, Hawaii, Honolulu, United States
- J Xiao, S Yang, M. Zhang, B. Sisman, D. Huang, L Xie, M Dong, H Li, ‘The I2R-NWPU-NUS Text-to-Speech System for Blizzard Challenge 2018’, INTERSPEECH Blizzard Challenge 2018 Workshop
- Sisman, B. and Li, H., 2018. Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion. Proc. Interspeech 2018, pp.52-56.
- B. Sisman, H. Li, K. C. Tan, “Transformation of prosody in voice conversion”, APSIPA ASC 2017, Kuala Lumpur, Malaysia
- B. Sisman, H. Li, K. C. Tan, “Sparse Representation of Phonetic Features for Voice Conversion with and without Parallel Data”, IEEE ASRU 2017, Okinawa, Japan