ICPR 2012 Tutorial: Music Information Research

ICPR 2012 Tutorial AM-05

Music Information Research: Signal Processing, Machine Learning, Nonparametric Bayes, User Interfaces, Information Retrieval, Singing, Web, and Crowdsourcing

Lecturer: Masataka Goto and Kazuyoshi Yoshii
National Institute of Advanced Industrial Science and Technology (AIST), Japan

ICPR 2012 (the 21st International Conference on Pattern Recognition)
November 11th, 2012
Tsukuba International Congress Center
Tsukuba Science City, JAPAN

Lecturer:

Masataka Goto and Kazuyoshi Yoshii
National Institute of Advanced Industrial Science and Technology (AIST), Japan

Abstract:

This tutorial is intended for an audience interested in music itself, music technologies, or the application of ICPR-related technologies to music domains. Audience members who are not familiar with music information research are welcome, and researchers working on music technologies are likely to find something new to study.

First, the tutorial serves as a showcase of music information research. The audience can enjoy and study many state-of-the-art demonstrations of music information research based on signal processing and machine learning. This tutorial highlights timely topics such as active music listening interfaces, singing information processing systems, web-related music technologies, crowdsourcing, and consumer-generated media (CGM).

Second, this tutorial explains the music technologies behind the demonstrations. The audience can learn how to analyze and understand musical audio signals, process singing voices, and model polyphonic sound mixtures. As a new approach to advanced music modeling, this tutorial introduces unsupervised music understanding based on nonparametric Bayesian models.

Third, this tutorial provides a practical guide to getting started in music information research. The audience can try available research tools such as music feature extraction, machine learning, and music editors. Music databases and corpora are then introduced. As a hint towards research topics, this tutorial also discusses open problems and grand challenges that the audience members are encouraged to tackle.

In the future, music technologies, together with image, video, and speech technologies, are expected to contribute toward all-around media content technologies.

Course Description:

Introduction
Active Music Listening Interfaces Based on Signal Processing and Machine Learning

Toward music listening in the future
Music playback (browsing)
Music touch-up (customization)
Music discovery (retrieval and recommendation)
Augmented Music-Understanding Interfaces

Singing Information Processing Systems

Systems for listening to singing voices
Systems for music information retrieval based on singing voices
Systems for singing synthesis and a singing robot
Open problems

Web-related Music Technologies, Crowdsourcing, and Consumer-Generated Media (CGM)

Songle: A crowdsourcing-based web service for Active Music Listening
DanceReProducer: An automatic mashup music video generation system
MusicCommentator: Generating comments synchronized with musical audio signals

Unsupervised Music Understanding Based on Nonparametric Bayesian Models

Introduction of the paradigm of Bayesian nonparametrics
Nonparametric Bayesian acoustic and language models
Multipitch analysis and chord progression analysis

Practical Guide to Getting Started in Music Information Research

Tools available for music information research
Music databases and corpora
Grand challenges

Conclusion and Discussion

Relevant References:

[1] M. Goto: Active Music Listening Interfaces Based on Signal Processing, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2007), pp.1441-1444, 2007.

[2] M. Goto: A Chorus-Section Detection Method for Musical Audio Signals and Its Application to a Music Listening Station, IEEE Transactions on Audio, Speech, and Language Processing, Vol.14, No.5, pp.1783-1794, 2006.

[3] M. Goto: Music Listening in the Future: Augmented Music-Understanding Interfaces and Crowd Music Listening, Proceedings of the AES 42nd International Conference on Semantic Audio, pp.21-30, 2011. (Invited Paper)

[4] M. Goto, T. Saitou, T. Nakano, and H. Fujihara: Singing Information Processing Based on Singing Voice Modeling, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2010), pp.5506-5509, 2010.

[5] M. Goto, J. Ogata, K. Yoshii, H. Fujihara, M. Mauch and T. Nakano: PodCastle and Songle: Crowdsourcing-Based Web Services for Retrieval and Browsing of Speech and Music Content, Proceedings of the First International Workshop on Crowdsourcing Web Search (CrowdSearch 2012), pp.36-41, 2012.

[6] M. Goto, K. Yoshii, H. Fujihara, M. Mauch, and T. Nakano: Songle: A Web Service for Active Music Listening Improved by User Contributions, Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), pp.311-316, 2011.

[7] K. Yoshii and M. Goto: Unsupervised Music Understanding Based on Nonparametric Bayesian Models, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), pp.5353-5356, 2012.

[8] K. Yoshii and M. Goto: A Nonparametric Bayesian Multipitch Analyzer Based on Infinite Latent Harmonic Allocation, IEEE Transactions on Audio, Speech, and Language Processing, Vol.20, No.3, pp.717-730, 2012.

[9] K. Yoshii and M. Goto: A Vocabulary-Free Infinity-Gram Model for Nonparametric Bayesian Chord Progression Analysis, Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), pp.645-650, 2011.

About Lecturer:

Masataka Goto (http://staff.aist.go.jp/m.goto/) received the Doctor of Engineering degree from Waseda University in 1998. He is currently a Prime Senior Researcher and the Leader of the Media Interaction Group at the National Institute of Advanced Industrial Science and Technology (AIST), Japan. In 1992 he was one of the first to start work on automatic music understanding, and has since been at the forefront of research in music technologies and music interfaces based on those technologies. Since 1998 he has also worked on speech recognition interfaces, and since 2006 he has overseen the development of web services based on content analysis and crowdsourcing (http://songle.jp and http://en.podcastle.jp). He serves concurrently as a Visiting Professor at the Institute of Statistical Mathematics, an Associate Professor (Cooperative Graduate School Program) in the Graduate School of Systems and Information Engineering, University of Tsukuba, and a Project Manager of the Exploratory IT Human Resources Project (MITOH Program) run by the Information Technology Promotion Agency (IPA).

Over the past 20 years, Masataka Goto has published more than 190 papers in refereed journals and international conferences and has received 29 awards, including several best paper awards, best presentation awards, and the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology (Young Scientists' Prize). He has served as a committee member of over 80 scientific societies and conferences and was the Chair of the IPSJ (Information Processing Society of Japan) Special Interest Group on Music and Computer (SIGMUS) in 2007 and 2008 and the General Chair of the 10th International Society for Music Information Retrieval Conference (ISMIR 2009). In 2011, as the Research Director he began a 5-year research project (OngaCREST Project) on music technologies, a project funded by the Japan Science and Technology Agency (CREST, JST).

Kazuyoshi Yoshii (http://staff.aist.go.jp/k.yoshii/) received the PhD degree in Informatics from Kyoto University, Japan in 2008. He is currently a Research Scientist at the National Institute of Advanced Industrial Science and Technology (AIST), Japan. He is well known in the field of music information processing for his knowledge of machine learning and Bayesian inference. He is also one of the pioneers who have applied the paradigm of Bayesian nonparametrics to music information processing. In particular, he is the first researcher to develop a nonparametric Bayesian multipitch analyzer.

He is the first author of more than 15 refereed conference papers and 3 refereed journal papers of IEEE Transactions on Audio, Speech, and Language Processing. He has received 15 awards, including the IPSJ Yamashita SIG Research Award and the Best-in-Class Award of MIREX 2005. His current research interests include probabilistic music modeling and blind source separation based on Bayesian nonparametrics. He is a member of the IEEE, Information Processing Society of Japan (IPSJ), and Institute of Electronics, Information and Communication Engineers (IEICE).

Back to:

Masataka GOTO (National Institute of Advanced Industrial Science and Technology (AIST))

last update: June 1, 2012