Represent the Degree of Mimicry between Prosodic Behaviour of Speech Between Two or More People

Arran, Matthew and Assier, Raphael and Benham, Graham and Deka, Bozena and Dempsey, Liam and Dubrovina, E. and Fadai, Nabil and Feier, Roxana and House, Thomas and Lambert, Anna and Lee, Jane and Maestri, Joseph and Miyajima, Naoko and O'Kiely, D. and Please, C.P. and Radivojevic, T.R. and Riley, E. and Rowley, W. and Wilmott, Z. (2015) Represent the Degree of Mimicry between Prosodic Behaviour of Speech Between Two or More People. [Study Group Report]

Preview

PDF
1MB

Abstract

ExpertoCrede want people to be better understood and aims to im- prove the way we communicate with each other. The study group focused on finding a way to analyse a conversation in order to extract information on the level of rapport between the two speakers. Since such data may contain sensitive personal data, the study group put an emphasis on only using methods which could be run locally on a smart- phone and avoided techniques that were computationally expensive or needed large datasets.

Firstly, the group considered the nature of turn taking within conversa- tions where there was rapport between the participants. This was done by manually labelling conversations from the BBC Listening Project [1]. The data suggested an element of memorylessness in turn taking within friendly conversations, and therefore Markov chains were used to model conversations. Each person speaking was considered to be a state in the Markov chain, and the probability of the speaker switch- ing was estimated from the data. The possibility of these probabilities changing within longer conversations was also considered, and some preliminary ODE models for this were suggested.

The group sought a way to distinguish between the two speakers and looked for a simple speaker identification algorithm which would be easy to implement on a mobile phone. Such a tool is necessary to enable all subsequent analysis of the sound data. Two algorithms were considered which provided reasonable level of information. Firstly, a low-frequency classification approach was used that took advantage of the natural difference in pitch of two speakers (especially in the case of a male-female conversation). The second approach utilised a Gaussian mixture model on the extracted Mel Frequency Cepstrum Coefficients.

Based on existing literature, conversational rapport was expected to correspond to mimicry in prosodic features of speech. To analyse this rapport, tools were developed to extract pitch, volume and speech rate from audio files, using Praat, a standard tool in academia, and custom- written Matlab codes. These tools were found to be broadly successful in extraction of these features. However, in the time available no con- sistently significant correlations or trends were found in natural high- rapport conversations from the BBC Listening Project, either over the course of a conversation or between the last few seconds of one speaker’s speech fragment and the first few seconds of the next speaker’s. Fur- ther work would involve following up on some potential correlations in such conversations, and in particular a comparison with low-rapport conversations.

Item Type:	Study Group Report
Problem Sectors:	Data processing
Study Groups:	UK Study Groups > ESGI 107 (Manchester, UK, Mar 23-27, 2015) European Study Group with Industry > ESGI 107 (Manchester, UK, Mar 23-27, 2015)
Company Name:	ExpertoCrede
ID Code:	704
Deposited By:	Bogdan Toader
Deposited On:	15 Jan 2017 22:24
Last Modified:	15 Jan 2017 22:28

Repository Staff Only: item control page