Imagine holding a meeting about a new product release, after which AI analyzes the discussion and creates a personalized list of action items for each participant. Or talking with your doctor about a diagnosis and then having an algorithm deliver a summary of your treatment plan based on the conversation. Tools like these can be a big boost given that people typically recall less than 20% of the ideas presented in a conversation just five minutes later. In healthcare, for instance, research shows that patients forget between 40% and 80% of what their doctors tell them very shortly after a visit.
You might think that AI is ready to step into the role of serving as secretary for your next important meeting. After all, Alexa, Siri, and other voice assistants can already schedule meetings, respond to requests, and set up reminders. Impressive as today’s voice assistants and speech recognition software might be, however, developing AI that can track discussions between multiple people and understand their content and meaning presents a whole new level of challenge.
Free-flowing conversations involving multiple people are much messier than a command from a single person spoken directly to a voice assistant. In a conversation with Alexa, there is usually only one speaker for the AI to track and it receives instant feedback when it interprets something incorrectly. In natural human conversations, different accents, interruptions, overlapping speech, false starts, and filler words like “umm” and “okay” all make it harder for an algorithm to track the discussion correctly. These human speech habits and our tendency to bounce from topic to topic also make it significantly more difficult for an AI to understand the conversation and summarize it appropriately.
Read the full article at Fast Company.
This article was produced by Footnote in partnership with Abridge.