ICML 2013 Tutorial: Music Information Research Based on Machine Learning

ICML 2013 Tutorial: Afternoon (14:00-17:30)

Download Tutorial Slides (Recorded Video)

Music Information Research Based on Machine Learning

Lecturers: Masataka Goto and Kazuyoshi Yoshii
National Institute of Advanced Industrial Science and Technology (AIST), Japan

ICML 2013 (The 30th International Conference on Machine Learning)
June 16th, 2013
Atlanta, USA

Abstract:

Music information research is gaining a lot of attention after 2000 when the general public started listening to music on computers in daily life. It is widely known as an important research field, and new researchers are continually joining the field worldwide. Academically, one of the reasons many researchers are involved in this field is that the essential unresolved issue is the understanding of complex musical audio signals that convey content by forming a temporal structure while multiple sounds are interrelated. Additionally, there are still appealing unresolved issues that have not been touched yet, and the field is a treasure trove of research topics that could be tackled with state-of-the-art machine learning techniques.

This tutorial is intended for an audience interested in the application of machine learning techniques to such music domains. Audience members who are not familiar with music information research are welcome, and researchers working on music technologies are likely to find something new to study.

First, the tutorial serves as a showcase of music information research. The audience can enjoy and study many state-of-the-art demonstrations of music information research based on signal processing and machine learning. This tutorial highlights timely topics such as active music listening interfaces, singing information processing systems, web-related music technologies, crowdsourcing, and consumer-generated media (CGM).

Second, this tutorial explains the music technologies behind the demonstrations. The audience can learn how to analyze and understand musical audio signals, process singing voices, and model polyphonic sound mixtures. As a new approach to advanced music modeling, this tutorial introduces unsupervised music understanding based on nonparametric Bayesian models.

Third, this tutorial provides a practical guide to getting started in music information research. The audience can try available research tools such as music feature extraction, machine learning, and music editors. Music databases and corpora are then introduced. As a hint towards research topics, this tutorial also discusses open problems and grand challenges that the audience members are encouraged to tackle.

In the future, music technologies, together with image, video, and speech technologies, are expected to contribute toward all-around media content technologies based on machine learning.

Course Description (3 hours):

Introduction
Active Music Listening Interfaces Based on Signal Processing and Machine Learning

Toward music listening in the future
Music playback (browsing), music customization, and music discovery (retrieval and recommendation)

Singing Information Processing Systems

Systems for listening to singing voices
Systems for music information retrieval based on singing voices
Systems for singing synthesis and a singing robot

Web-related Music Technologies, Crowdsourcing, and Consumer-Generated Media (CGM)

Songle: A crowdsourcing-based web service for Active Music Listening
DanceReProducer: An automatic mashup music video generation system
MusicCommentator: Generating comments synchronized with musical audio signals

(Break [30 minutes])

Unsupervised Music Understanding Based on Nonparametric Bayesian Models

Introduction to Bayesian nonparametrics
Nonparametric Bayesian acoustic models (variants of harmonic clustering and NMF)
Nonparametric Bayesian language models (variants of n-gram models and PCFG)
Multipitch analysis and chord progression analysis

Practical Guide to Getting Started in Music Information Research

Auto tagging
Music Recommendation
Music databases and corpora
Tools available for music information research
Grand challenges

Conclusion and Discussion

About Lecturers:

Masataka Goto (http://staff.aist.go.jp/m.goto/) received the Doctor of Engineering degree from Waseda University in 1998. He is currently a Prime Senior Researcher and the Leader of the Media Interaction Group at the National Institute of Advanced Industrial Science and Technology (AIST), Japan. In 1992 he was one of the first to start work on automatic music understanding, and has since been at the forefront of research in music technologies and music interfaces based on those technologies. Since 1998 he has also worked on speech recognition interfaces, and since 2006 he has overseen the development of web services based on content analysis and crowdsourcing (http://songle.jp and http://en.podcastle.jp). He serves concurrently as a Visiting Professor at the Institute of Statistical Mathematics, an Associate Professor (Cooperative Graduate School Program) in the Graduate School of Systems and Information Engineering, University of Tsukuba, and a Project Manager of the Exploratory IT Human Resources Project (MITOH Program) run by the Information Technology Promotion Agency (IPA).

Over the past 21 years, Masataka Goto has published more than 190 papers in refereed journals and international conferences and has received 33 awards, including several best paper awards, best presentation awards, and the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology (Young Scientists' Prize). He has served as a committee member of over 80 scientific societies and conferences and was the Chair of the IPSJ (Information Processing Society of Japan) Special Interest Group on Music and Computer (SIGMUS) in 2007 and 2008 and the General Chair of the 10th International Society for Music Information Retrieval Conference (ISMIR 2009). In 2011, as the Research Director he began a 5-year research project (OngaCREST Project) on music technologies, a project funded by the Japan Science and Technology Agency (CREST, JST).

Google scholar h-index=36: http://scholar.google.com/citations?user=4JJCMq8AAAAJ&hl=en

Kazuyoshi Yoshii (http://staff.aist.go.jp/k.yoshii/) received the PhD degree in Informatics from Kyoto University, Japan in 2008. He is currently a Senior Researcher at the National Institute of Advanced Industrial Science and Technology (AIST), Japan. He is well known in the field of music information processing for his knowledge of machine learning and Bayesian inference. He is also one of the pioneers who have applied the paradigm of Bayesian nonparametrics to music information processing. In particular, he is the first researcher to develop a nonparametric Bayesian multipitch analyzer.

He is the first author of more than 15 refereed conference papers and 3 refereed journal papers of IEEE Transactions on Audio, Speech, and Language Processing. He has received 15 awards, including the IPSJ Yamashita SIG Research Award and the Best-in-Class Award of MIREX 2005. His current research interests include probabilistic music modeling and blind source separation based on Bayesian nonparametrics. He is a member of the IEEE, Information Processing Society of Japan (IPSJ), and Institute of Electronics, Information and Communication Engineers (IEICE).

Google scholar h-index=11: http://scholar.google.com/citations?user=QaNTClUAAAAJ&hl=en

Recent Keynote/Invited Talks and Tutorials by Lecturers:

Keynote talk "Music Listening in the Future" (Masataka Goto) in the 128th AES Convention in London (AES London 2010) of the Audio Engineering Society (AES), London, UK, May 2010.

AES Press Release: AES London 2010 "Distinguished Guests Prepare For AES London: Sir George Martin CBE & Masataka Goto To Attend AES London Convention"
(Report on Pro Sound News Europe)
Keynote talk "PodCastle and Songle: Crowdsourcing-Based Web Services for Spoken Content Retrieval and Active Music Listening" (Masataka Goto) in the International ACM Workshop on Crowdsourcing for Multimedia (CrowdMM 2012), Nara, Japan, October 2012.
Keynote talk "PodCastle and Songle: Web Services for Retrieval and Browsing of Speech and Music Content on the Basis of Automatic Content Analysis and Crowdsourcing" (Masataka Goto) in the International Workshop on Search Computing of CHORUS+ Network of Audio-Visual Media Search, Brussels, Belgium, September 2012.

Video archive
Keynote talk "Augmented Music-Understanding Interfaces: Toward Music Listening in the Future" (Masataka Goto) in AdMIRe 2009 (International Workshop on Advances in Music Information Research 2009) of IEEE ISM 2009 (International Symposium on Multimedia 2009), San Diego, California, USA, December 2009.
Keynote talk "Active Music Listening Interfaces Based on Music-Understanding Technologies" (Masataka Goto) in TELECOM ParisTech Workshop on Music Signal Processing, Paris, France, June 2008.

Invited talk "Music Listening in the Future: Augmented Music-Understanding Interfaces and Crowd Music Listening" (Masataka Goto) in the AES 42nd International Conference on Semantic Audio of the Audio Engineering Society (AES), July 2011.
Invited talk "Singing Information Processing: Concept and Applications" (Masataka Goto) in the 160th Meeting of the Acoustical Society of America (ASA), November 2010.
Invited talk "PodCastle: A Spoken Document Retrieval Service Improved by User Contributions" (Masataka Goto) in the 24th Pacific Asia Conference on Language, Information and Computation (PACLIC 24), November 2010.
Invited talk "Toward Music Listening Interfaces in the Future" (Masataka Goto) in the Microsoft Research Asia Faculty Summit 2010, October 2010.
Invited talk "Singing Information Processing Systems" (Masataka Goto) in the InterSinging 2010, October 2010.

Tutorial "Music Information Research: Signal Processing, Machine Learning, Nonparametric Bayes, Interface, Retrieval, Singing, and Crowdsourcing" (Masataka Goto and Kazuyoshi Yoshii) in the 21st International Conference on Pattern Recognition (ICPR 2012), November 2012.
Tutorial "Music Information Retrieval" (Markus Schedl and Masataka Goto) in the Second ACM International Conference on Multimedia Retrieval (ICMR 2012), June 2012.

References by Lecturers:

Other relevant references are shown in the tutorial slides.

[1] M. Goto: Active Music Listening Interfaces Based on Signal Processing, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2007), pp.1441-1444, 2007.

[2] M. Goto: A Chorus-Section Detection Method for Musical Audio Signals and Its Application to a Music Listening Station, IEEE Transactions on Audio, Speech, and Language Processing, Vol.14, No.5, pp.1783-1794, 2006.

[3] M. Goto: Music Listening in the Future: Augmented Music-Understanding Interfaces and Crowd Music Listening, Proceedings of the AES 42nd International Conference on Semantic Audio, pp.21-30, 2011. (Invited Paper)

[4] M. Goto, T. Saitou, T. Nakano, and H. Fujihara: Singing Information Processing Based on Singing Voice Modeling, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2010), pp.5506-5509, 2010.

[5] Masataka Goto, Tomoyasu Nakano, Shuuji Kajita, Yosuke Matsusaka, Shin'ichiro Nakaoka, and Kazuhito Yokoi: VocaListener and VocaWatcher: Imitating a Human Singer by Using Signal Processing, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), pp.5393-5396, 2012.

[6] M. Goto, J. Ogata, K. Yoshii, H. Fujihara, M. Mauch and T. Nakano: PodCastle and Songle: Crowdsourcing-Based Web Services for Retrieval and Browsing of Speech and Music Content, Proceedings of the First International Workshop on Crowdsourcing Web Search (CrowdSearch 2012), pp.36-41, 2012.

[7] M. Goto, K. Yoshii, H. Fujihara, M. Mauch, and T. Nakano: Songle: A Web Service for Active Music Listening Improved by User Contributions, Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), pp.311-316, 2011.

[8] Masataka Goto: Grand Challenges in Music Information Research, In Meinard Muller, Masataka Goto, and Markus Schedl, editors, Dagstuhl Follow-Ups: Multimodal Music Processing, pp.217-225, Dagstuhl Publishing, 2012.

[9] K. Yoshii and M. Goto: Unsupervised Music Understanding Based on Nonparametric Bayesian Models, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), pp.5353-5356, 2012.

[10] K. Yoshii and M. Goto: A Nonparametric Bayesian Multipitch Analyzer Based on Infinite Latent Harmonic Allocation, IEEE Transactions on Audio, Speech, and Language Processing, Vol.20, No.3, pp.717-730, 2012.

[11] K. Yoshii and M. Goto: A Vocabulary-Free Infinity-Gram Model for Nonparametric Bayesian Chord Progression Analysis, Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), pp.645-650, 2011.

[12] K. Yoshii and M. Goto: Infinite Latent Harmonic Allocation: A Nonparametric Bayesian Approach to Multipitch Analysis, Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR 2010), pp.309-314, 2010.

[13] K. Yoshii and M. Goto, et al.: An Efficient Hybrid Music Recommender System Using an Incrementally Trainable Probabilistic Generative Model, IEEE Transactions on Audio, Speech, and Language Processing, Vol.16, No.2, pp.435-447, 2008.

[14] K. Yoshii, M. Goto, and H. G. Okuno: Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates with Harmonic Structure Suppression, IEEE Transactions on Audio, Speech, and Language Processing, Vol.15, No.1, pp.333-345, 2007.

Back to:

Masataka GOTO (National Institute of Advanced Industrial Science and Technology (AIST))

last update: June 16, 2013