TY - GEN
T1 - Vroom!
T2 - 5th ACM SIGIR Conference on Human Information Interaction and Retrieval, CHIIR 2020
AU - Zhang, Yichi
AU - Hu, Junbo
AU - Zhang, Yiting
AU - Pardo, Bryan
AU - Duan, Zhiyao
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/3/14
Y1 - 2020/3/14
N2 - Traditional search through collections of audio recordings compares a text-based query to text metadata associated with each audio file and does not address the actual content of the audio. Text descriptions do not describe all aspects of the audio content in detail. Query by vocal imitation (QBV) is a kind of query by example that lets users imitate the content of the audio they seek, providing an alternative search method to traditional text search. Prior work proposed several neural networks, such as TL-IMINET, for QBV, however, previous systems have not been deployed in an actual search engine nor evaluated by real users. We have developed a state-of-the-art QBV system (Vroom!) and a baseline query-by-text search engine (TextSearch). We deployed both systems in an experimental framework to perform user experiments with Amazon Mechanical Turk (AMT) workers. Results showed that Vroom! received significantly higher search satisfaction ratings than TextSearch did for sound categories that were difficult for subjects to describe by text. Results also showed a better overall ease-of-use rating for Vroom! than TextSearch on the sound library used in our experiments. These findings suggest that QBV, as a complimentary search approach to existing text-based search, can improve both search results and user experience.
AB - Traditional search through collections of audio recordings compares a text-based query to text metadata associated with each audio file and does not address the actual content of the audio. Text descriptions do not describe all aspects of the audio content in detail. Query by vocal imitation (QBV) is a kind of query by example that lets users imitate the content of the audio they seek, providing an alternative search method to traditional text search. Prior work proposed several neural networks, such as TL-IMINET, for QBV, however, previous systems have not been deployed in an actual search engine nor evaluated by real users. We have developed a state-of-the-art QBV system (Vroom!) and a baseline query-by-text search engine (TextSearch). We deployed both systems in an experimental framework to perform user experiments with Amazon Mechanical Turk (AMT) workers. Results showed that Vroom! received significantly higher search satisfaction ratings than TextSearch did for sound categories that were difficult for subjects to describe by text. Results also showed a better overall ease-of-use rating for Vroom! than TextSearch on the sound library used in our experiments. These findings suggest that QBV, as a complimentary search approach to existing text-based search, can improve both search results and user experience.
KW - Siamese style convolutional recurrent neural networks
KW - Sound search
KW - Subjective evaluation
KW - Text description
KW - Vocal imitation
UR - http://www.scopus.com/inward/record.url?scp=85082451822&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85082451822&partnerID=8YFLogxK
U2 - 10.1145/3343413.3377963
DO - 10.1145/3343413.3377963
M3 - Conference contribution
AN - SCOPUS:85082451822
T3 - CHIIR 2020 - Proceedings of the 2020 Conference on Human Information Interaction and Retrieval
SP - 23
EP - 32
BT - CHIIR 2020 - Proceedings of the 2020 Conference on Human Information Interaction and Retrieval
PB - Association for Computing Machinery, Inc
Y2 - 14 March 2020 through 18 March 2020
ER -