The Halting Problem

In a previous post, I wrote about adding read-aloud support to a website we maintain. I mentioned how it was pretty straightforward to do and even easy to get the browser to choose a correct voice that matches the language of the text being spoken. We did, however, hit one puzzling and difficult-to-resolve issue: for large pieces of text to be read aloud, sometimes a browser would simply stop speaking. No errors appeared in the JavaScript console, and querying the speechSynthesis API indicated it was “speaking,” but it wasn’t.

Searching around the web revealed many reports of the problem, going back over a decade (!), sometimes with workarounds once confirmed to work by commenters and later reported as no longer working by others. Eventually, we confirmed the issues we were seeing matched an open Chromium bug which indicates the problem is associated with choosing a Google-provided voice for speaking.

To avoid the bug entirely, we initially switched our code from setting lang on a SpeechSynthesisUtterance to setting a chosen voice and never considering a voice with a name that started with “Google” to be eligible for choosing. That works to avoid the problem, but can severely limit the availability of read-aloud support. Chrome on my main development machine running Ubuntu has only Google-provided voices, and with this change didn’t offer me any read-aloud support: I had to use Firefox or a machine running a different OS to do my own local dev testing.

Avoiding the issue was always just intended to be a quick first step to allow for enablement of the read-aloud function in production while we continued to look for a reliable way to work around the problem. Given how old the issue appears to be, and the lack of activity on the open bug, it didn’t seem likely we’d get a fix soon. One thought we had was to break up the text into sentences and create individually spoken SpeechSynthesisUtterances for each sentence. That feels a little hard and error-prone though.

In reviewing the workarounds noted in reports of this issue, I had seen multiple mentions of working around it by calling resume() on the API periodically. Later it was reported that that stopped working, but if one called pause() before resume(), that did work. I had done a quick try of this workaround back when we were first investigating the issue, and it did not seem to work. However, when faced with “OK now we really need to figure out how to work around this,” and the next alternative being more complicated than I wanted to tackle, I decided to give it another try—and found that indeed this pause()/resume() does work to keep Google voices speaking large blocks of text.

Our code for speaking now looks something like this:

const utterance = new SpeechSynthesisUtterance(txt);
utterance.voice = langVoiceMap.get(currentLang);
let intervalID;

utterance.onstart = (event) => {
  intervalID = setInterval(() => {
    if (!speechSynthesis.speaking) {
      clearInterval(intervalID);
    } else {
      speechSynthesis.pause();
      speechSynthesis.resume();
    }
  }, 14000);
};

utterance.onend = (event) => {
  clearInterval(intervalID);
};

window.speechSynthesis.speak(utterance);

This code adds a start event handler that is called when the browser starts to speak the utterance. That handler sets up an every-14-second call to a function that cancels itself if it finds that the browser is no longer speaking, or calls pause() and then resume(). It also adds an end event handler that cancels the recurring function call. With these event handlers, Google voices no longer halt mid-speech when given long text to speak.

Have you used the Speech Synthesis API and run into this problem? How did you work around it?