I woke up in a cold sweat the other day. I literally had an AV Commissioning Nightmare. My friend and I were discussing integrating Alexa into conference spaces the day before. It’s entirely possible to start a conference call by saying, “Alexa, call my bridge.” However, as my friend figured out, if someone says “Alexa, turn off the lights” during the conference, several Alexa-enabled offices on the call may go dark (and then you would hear a menacing chuckle from the light-turner-offer…mua-ha-ha-ha-ha…). I think it is a very exciting prospect, but my nightmare was based on testing a system like that. How can you make sure it works for all users?!
Living in the New York area, we pride ourselves on the incredible diversity of our workforce. We have people from different generations, countries, genders, accents, speaking levels, etc. I appreciate that the voice-recognition device gets to know its owner over time and can train to better understand his/her voice. However, in a conference space, there may be hundreds of owners. What’s poor lil’ Alexa going to do?
We have to worry about voice levels. Will Alexa be able to hear you from your seat? Over the HVAC? Over the din of a meeting?
We have to worry about accents. It’s cute when Alexa can’t figure out that my five-year-old is asking her to tell a joke in our home, due to his five-year-old “accent.” It will not be cute when a visiting executive can’t show her laptop because the system can’t make out her accented English. Depending on the level of meeting, any hiccups like that could conceivably lead to discriminatory lawsuits. (The funny thing is that people are considering implementing voice control out of fear of discriminatory lawsuits. I guess the question becomes who’s more lawyer-happy: people representing those with mobility and sight constraints, or people representing those who don’t speak American good.)
We have to worry about terminology. I rarely see agreement in what sources are called between drawings, device labels and control interfaces. The drawing may say Computer 1, the cable might say Front Laptop and the touch panel shows Table Laptop 2. Imagine the potential confusion when you have to come up with the source name without clues.
“Alexa, show ‘my computer’ on the front display.”
“I’m sorry, I don’t know what “My Computer” is.”
As if AI bots coming up with their own languages and androids being granted citizenship despite being extraordinarily easy to convince that killing off the human race is a fine idea weren’t enough, there is the very real possibility of users in one room controlling both local and remote systems during a call.
People listening to podcasts constantly get their phone inadvertently hijacked, if the host asks their phone a question with “Hey, Siri.” The phone doesn’t know that the podcast is “Hey-Siri-ing” and not the owner. Siri is just trying to help out when anyone says “Hey, Siri.” Along the same lines of what my friend does with the light controls during bridge calls, could you imagine giving a presentation during a high-pressure meeting, asking Alexa to switch sources, and having all offices on the call getting inadvertently hijacked and switching to their local sources?!
Lastly, how would you commission something like this to assure it is fit for use for all users? Do you bring in a sampling of talkers? Does the commissioning specialist try their hand at impersonating various accents? Personally, whenever I try an English accent, it rapidly devolves into a Chinese-Irish conflation. Ain’t nobody got time for that. How could we test it?
People played with the idea of gesture recognition for presentation spaces, but the thought of CEOs flailing their arms unsuccessfully trying to advance a slide was comically terrifying. Voice control is slightly less comically terrifying, but still nightmare-worthy. Instead of physically flailing around trying to make it work, I could see users getting into a cussin’ match with Alexa, if she doesn’t understand them. “No, Alexa! Not that computer, you nearsighted scrap pile. Yo’ Mama was the tape recorder Arnold Schwarzenegger used in ‘True Lies!’”
I guess the moral of the story is adding voice control to a typical system would be a nice addition to a more conventional control system. It could provide access to controls for the immobile, the blind and, if we’re being honest, the lazy. However, to have it as the primary method of system control, at least for the time being, is risky, especially for public conference spaces. Relegating Alexa to a secondary control method would certainly help me sleep better at night.