Loading...
Please wait, while we are loading the content...
An Exploration into the Application of Convolutional Neural Networks for Instrument Classification on Monophonic Note Samples
| Content Provider | Semantic Scholar |
|---|---|
| Author | Owers, James |
| Copyright Year | 2016 |
| Abstract | We show that instruments can be accurately recognised in monophonic, single note samples, using minimal temporal information. To classify each note sample, our method uses a convolutional neural network to predict the instrument for each frame of a Constant Q Transform, and simply takes the mean of these predictions. In contrast, state-of-the-art methods use hand engineered features to encode the temporal dependencies within a note. By restricting the data it was trained on, we demonstrate that our model has the capability to generalise classification over the pitch range of instruments; it can classify the instrument for notes with a pitch that it has never before seen for that instrument. To our knowledge, we are the first to do this explicitly. The high accuracy of our method (super human performance) suggests that there is enough information within short audio time frames to classify notes, and that CNNs can automatically learn features to leverage this information. Based on this observation, this work provides a solid basis from which to analyse the improvement made by temporal models with a similar structure, such as Recurrent Neural Networks. It also serves as the foundation for our future work, providing an architecture which we can extend to classify polyphonic, multi-instrument signals, or operate on lower level, raw waveform data. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | https://jamesowers.github.io/files/thesis.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |