Environmental audio scene and activity recognition plays an important role in context aware computing, which becomes more important in the user interaction with mobile devices. This is an emerging research area and many independent researches are performed in diverse different research fields. In multimedia retrieval area, MPEG-7 Audio description standard deals with how to represent environmental sounds in view of indexing and retrieval. In environmental protection area, sound maps are created focusing on animal sounds and long term change of the map. For healthcare area, patients indoor activities are monitored based on sound and used to find emergency situation and abnormality. It is also used in music transcription to find out the kind of instruments. If we say previous sound recognition focused on human speech, now that field is matured and it is being extended all non-speech sounds we encounters.
Though many researches from diverse areas are performed, most of the them are initiated by needs and used conventional methods from speech recognition and general machine learning. We reviewed these researches and tried to provide good methods at each stage of sound recognition system. One other issue of sound recognition is its diversity and personality. Environmental sounds are different in different locations and for different person. If a person deviates from his daily routine he will encounter new sounds. For this reason, a well designed sound database can help comparing algorithm performance, we need diverse real-world sounds to make a practical system. We proposed crowd sourcing framework to collect these data and instance based classifier that improves with more data incrementally without full retraining.
We used bag of words approach that is used for document classification and object recognition in an image. We saw environmental audio scene as set of audio events similar to words in a document. Sequence matching approach originated f...