You are currently viewing Teach your robots how to show empathy

Teach your robots how to show empathy

Speech technology development

Speech is the primary form of communication for human-human interactions. Humans have the ability to express their mental state, positive or negative feelings towards the speaker and reaction to incidents in their life (just to name a few) by means of the speech they produce. 

Robotic technology development

Robotic technology development has enabled robots to be implemented in human-centred environments, trying to interact with people, sharing opinions and involving in activities. This field is called Human-Robot Interaction (HRI) and focus is given to make robots learn to react to users in a social and engaging way. Such Social Robots are being used for various applications like education, passenger guidance and healthcare. Artificial speech produced using synthesizers is used as one of the main modalities by which robots communicate with humans. With a wide range of robot applications in human-centered environments, speech plays a key role in enabling robots to interact socially with humans. Verbal messages rendered only to convey explicit information/instructions are not sufficient in such interactions. The robots should be able to interact with users in a social and engaging way, and developing resources for this interaction via speech is the focus of this article.

Empathetic social robots

There are many robots being developed nowadays that interact with humans, just like humans interact with each other. We all are familiar with Robot Sophia, which interacts quite well with humans, and was also given the citizenship of Saudi Arabia. 

Experience at CARES

There are also many talking robots being developed by researchers in different countries. I have been fortunate to work with the Centre for Automation and Robotic Engineering Science (CARES) at the University of Auckland, New Zealand. This is a robotics research group that was established in 2010 as an interdisciplinary group, bringing together all the robotics research happening at the university under one umbrella. The Centre aims to develop robotic technology for the wellbeing of the society. In Figure 1, a Healthcare Robot (Healthbot) developed at the Centre can be seen interacting with an elder person. These robots are developed as a support to the healthcare staff in old-age homes. They help by reminding people about taking medications, detecting falls and also talking to the patients as a companion. To improve the way in which these robots talk to patients is my research focus.

Figure 1: Healthbot interacting with user. 

(Image from CARES, University of Auckland)

Research outline

Currently, the robots talk monotonously without any emotions. They just provide the instructions required in a robotic voice. But, this is not how human nurses/doctors talk to patients. Nurses/doctors use speech to express emotions and use it as a means to be empathetic to the patients. But, we are not sure if humans will like to have robots that speak empathetically to them. As robots are very different from humans, and they genuinely cannot “feel” for the human user, a robot expressing empathy may be perceived as fake and hence unacceptable by the human users. To get evidence for this, we ran a large-scale human perception experiment with 120 participants. The setup for the test is shown in Figure 2.

Figure 2: Human perception experiment setup.

Experiment: Empathy and Robots

For this experiment the participants listened to a robot speaking to a patient. The robot spoke using two voices. The first voice had the robot speaking monotonously, without any emotions. While the second voice had the robot speaking with empathy to the patient. Both the voices were produced by a voice actor speaking in the two styles. The participants were asked to rate which of the voices they preferred. 95% of the participants reported that they preferred the empathetic voice compared to the robotic one. This is a very strong result, and gave us good evidence that the future of speaking robots lies in robots that express empathy and emotions to human users.  More details of this experiment can be read in the publication reported in [1]. 

Now, the big task is to actually develop robotic voices that sound empathetic. This task involves two parts- one is to model the dialogs spoken by the robot to sound empathetic and second is to model the prosody (tone, duration etc.) of the robot’s speech to be like an empathetic human’s speech. For this we have developed a large database of human speech spoken by trained professional voice actors. They have spoken many dialogs emotionally, like a doctor would speak to a patient. These dialogs have been recorded, and they have formed a large speech corpus called the JL-corpus. This database is being analysed currently to understand patterns in the emotional speech that actually make it sound empathetic and different from the monotonous robotic speech. The details about this database and the preliminary analysis done on it is discussed in detail in the publication reported in [2].

With the analysis of the empathetic voices from the database developed, features of the speech signal that are impacted the most by the expression of empathy in the voice were identified. We decided to focus on the pitch (fundamental frequency) contour, speech rate (number of syllables per second) and intensity (loudness in dB) as the features of speech that need to be modelled to develop an empathetic voice. Among these, the pitch contour was modelling using a parametric approach (called Fujisaki model for pitch contour). The speech rate and intensity were modelled by framing rules by analysing the database discussed above. A simplified block diagram is shown in Figure 3. The transformation of the pitch contour to that of empathetic speech was carried out by machine learning based regression modelling. Once the transformation of the pitch contour was done, and the rules for the speech rate and intensity were identified, then the speech signal was re-synthesised to produce empathetic speech.

\ Diagram.png

Figure 3: Block diagram of end-to-end empathetic speech synthesis system.

This empathetic speech was then given as the voice for the same healthcare robot discussed above. Human participants were asked to choose if they preferred the newly synthesised empathetic voice or the usual robotic voice (which had no empathy). Note that, in this case we have used synthesised empathetic voice and synthesised robotic voice (without empathy). While in the first experiment discussed above, the voices were not artificially synthesised; they were just acted out by a voice professional. The experiment setup was similar to the one in Figure 2. In this experiment, 83% of the participants (total experiment was done was 58 participants) reported that they preferred the synthesised empathetic voice over the robotic voice (without empathy). This is again a strong result that people prefer robotic voices to be empathetic in a healthcare application.

Step into world of speech synthesis

Speech technology development is an exciting research field with many opportunities for engineers. The skills required will be signal processing, machine learning and statistical analysis. There are also other areas that are needed in such technology development like computer networks (for the speech technology to interact with other devices) and off course the mechanical aspect of robots that use the voices. It is not necessary to be proficient in all these fields. But an understanding of what a speech signal is and various techniques of processing it will be key in getting into this field. Machine learning techniques used to build models keep on changing, as technology development happens. Also, in speech technology development, building languages resources is also important. So, this is an area where your language is your strength, and a good knowledge of your own mother tongue or other languages will be of great advantage to develop speech technology in these languages. 

If you have completed a Bachelor’s degree in Electrical/ Electronics/ Computer/ Biomedical engineering or related fields, you can always think of doing research in speech technology development. If you are interested to learn more and get a basic understanding of what speech signal processing is, doing a Master’s degree in Signal processing is highly recommended. Doing your Master’s research project in speech signal processing would be a good step to understand the basics of research in the field. In terms of further doing a PhD, there are many universities in India and other countries that are doing a lot of speech technology development. Research to develop speech technology in Indian languages and Indian-English is very popular these days because of the large number of technology users that are there in India. Hence, getting PhD scholarships or funding in the field will have many options. Even without a PhD, there are options to join companies that develop speech technology like Amazon Alexa, Apple Siri, Google Assistant, and Microsoft Research etc. With the big market of Indian languages some of these companies also have research centres in India, which are good options to start working on your own languages. This is a field where your own language is your strength and hence the possibilities of building a career around your strength is very exciting and beneficial for your own community of language-users.


Empathy in human-robot interaction is a fairly new research area. Hence, the experiments reported here are only starting steps to develop robots that are empathetic to human users. The main conclusions from the experiments reported here are that people prefer an empathetic voice for healthcare applications, and that it is possible to synthesise an empathetic voice by correctly modelling the features of the speech signal. Also, human participants could perceive empathy in the robot’s voice even though it was artificially synthesised.

Speaking robots are the future of social robots that interact with humans. A good quality voice that interacts with humans is also a need for interacting technology like Apple Siri, Amazon Alexa and many others that talk to us nowadays. My research is aimed at developing empathetic and good-quality voices that encourage users to interact with robots and other talking devices. This is a research in progress, and the aim is to develop more human-like voices that will improve the comfortability of users to interact with robots and thereby improve their acceptance. Please refer to websites [3] and [4] for more updates on the latest research happening in the field of speech technology and robotics.


[1] James, Jesin & Watson, Catherine & MacDonald, Bruce. (2018). Artificial Empathy in Social Robots: An analysis of Emotions in Speech., International Symposium on Robot and Human Interactive Communication (RO-MAN), Pages:  632-637

[2] James, J., Tian, L. and Watson, C.I., (2018). An Open Source Emotional Speech Corpus for Human Robot Interaction Applications. In INTERSPEECH pages: 2768-2772.



About the Author

Jesin James

Lecturer at the University of Auckland


Jesin is a Lecturer at the University of Auckland, New Zealand. Her research areas are speech technology development for human-robot interaction and for under-resourced languages. Jesins’ PhD (from the University of Auckland) focused on developing empathetic voices for healthcare robots. As part of her current research and her Master’s research, she develops speech technology for under-resourced languages like Malayalam, Māori, New Zealand English. Jesin is also passionate about engineering education, and works on developing effective teaching-learning methods for student learning and engagement. Jesin strongly believes that if you can combine your passion with your career, you will never feel that you are working. Engineering offers the possibility to combine your passion with technology and make a difference. Jesin enjoys languages, music, travelling and interacting with people – and being a lecturer in Electrical, Computer, and Software Engineering working on speech technology for languages is definitely tying everything up nicely.