Examination Committee
Prof Bertram SHI, ECE/HKUST (Chairperson)
Prof Pascale FUNG, ECE/HKUST (Thesis Supervisor)
Prof Tim CHENG, ECE/HKUST
Abstract
Query-by-humming is a content-based music retrieval method that can retrieve melodies using users' hummings as queries. This allows users to find melodies only using its tune and does not require any knowledge of its related metadata or even lyrics. The general paradigm of Query by Humming systems first transcribes the notes of the music signals of both the query and the melodies in the database. The notes of the query are then compared against those of the melodies in the database and the melody most similar to the query is retrieved.
In this thesis, we use deep learning for humming transcription and show that it performs better than other signal processing and supervised learning approaches. We use a database of monophonic melodies to train a hybrid model using Convolutional Neural Network (CNN) with Hidden Markov Model (HMM), where 3-state HMM is used to model each note and CNN is used to model the posterior probabilities of the HMM note model. We also use CNN to directly process raw humming audio data instead of applying feature engineering first. We then use a note-based retrieval method for candidate melody retrieval.
We use standard datasets to evaluate our transcription system and the overall query by humming system and compare the results against other algorithms. Using the CNN model that directly processes raw audio data on a standard test set, we get a F-measure of 54% which is marginally better than what we get when features are used, and 11% more than what we get using a simple HMM-GMM system. The system also gives better results than other state-of-the-art singing transcription systems. The overall query by humming system gives an overall MRR of 0.92 using the standard MIREX dataset, which is also an improvement over other note-based query by humming systems.