The CSS Speech Module

4 min readJul 24, 2018

You may be thinking that this is an idea for a future module of CSS but actually the CSS specification for this was already finished six years ago.

But why have you never heard about it?
Because (almost) no browser supported it and in the end it was retired by W3C a month ago on 2018–06–05.

History

Twenty years ago CSS 2.0 was published and part of this release was
ACSS — the Aural CSS module. But the module was quickly replaced with the ‘speech’ keyword in CSS 2.1. This specification just reserved the keyword but didn’t specify any of it’s properties or values.

On 2012–03–20 the CSS speech module reached CR (Candidate Recommendation) and remained in this state until it was retired just about one month ago before any of the most-used browsers implemented it.

But what was the idea behind the module?

The idea

The speech module enables you to style how the elements in your document are spoken. The module contains properties to specify how a document is rendered by a speech synthesizer e.g. volume, voice, speed, pitch, cues, pauses, etc.

The module aimed to assist people who are blind, visually-impaired or otherwise print-disabled by enabling websites to optimize their content aurally. This technology could also be used for other things like teaching kids how to read.

So now that you know about the goals of the CSS speech module let’s see some code!

Properties

Let’s start with some straight forward properties. I will take some parts from the CSS specification, so if you want to learn more about them I recommend you check out the draft!

voice-volume
With the voice-volume property you can control the volume.

Some valid values are silent, x-soft, soft, medium, loud or x-loud. But you can also use decibel if you’d like! The decibels represents the change (positive or negative) relative to the given keyword value (see enumeration above) or to the default value for the element.

h1
{
  voice-volume: medium 6dB;
}

voice-balance
With voice-balance you can control the spatial distribution of audio output. You can use a number between -100 and 100 or any of the following values: left, center, right, leftwards, rightwards.

h1
{
  voice-balance: left;
}

speak
The speak property determines whether or not to render text aurally.
You can either use auto, never or always. It’s important to know that the initial value of speak is auto and if the value is auto it will use the value of the display or visibility-property on the element. So if you set your element to display: none it will also set speak to the value never.

h1
{
  speak: always;
}

speak-as
You can use speak-as to determine in what manner text gets rendered aurally, based upon a predefined list of possibilities.

Valid values are normal, spell-out, digits, literal-punctuation or no-punctuation. So you could use e.g. spell-out to spell the text one letter at a time or digits to speak numbers one digit at a time.

h1
{
  speak-as: spell-out;
}

The aural formatting model

You can imagine the properties pause, cue and rest as an aural equivalent to padding, border and margin. They surround the styled element.

The CSS formatting model for aural media is based on a sequence of sounds and silences that occur within a nested context similar to the visual box model, which we name the aural “box” model.

The image above visualises the order of pause, cue and rest as well as their visual box model equivalents.

Voice characteristics

The CSS speech module also provides you properties to modify the voice characteristics. You can change the sound of the voice with voice-family (an equivalent to font-family), change the speed of the spoken text with voice-rate as well as the pitch (voice-pitch), range (voice-range) and stress (voice-stress).

Example

Now that you’ve had a brief introduction to each property of the CSS speech module we can take a look at an example. Below you see a snippet which is provided directly from the draft showing you how an implementation with the module would look like:

h1, h2, h3, h4, h5, h6
{
  voice-family: paul;
  voice-stress: moderate;
  cue-before: url(../audio/ping.wav);
  voice-volume: medium 6dB;
}
p.heidi
{
  voice-family: female;
  voice-balance: left;
  voice-pitch: high;
  voice-volume: -6dB;
}
p.peter
{
  voice-family: male;
  voice-balance: right;
  voice-rate: fast;
}
span.special
{
  voice-volume: soft;
  pause-after: strong;
}

...

<h1>I am Paul, and I speak headings.</h1>
<p class="heidi">Hello, I am Heidi.</p>
<p class="peter">
  <span class="special">Can you hear me ?</span>
  I am Peter.
</p>

Conclusion

I never heard about the CSS speech module before I found it through random searching in the CSS specifications. It wasn’t implemented by any of the major browsers since it got the CR status and was retired just some weeks ago (sadly I couldn’t find the exact reason anywhere, why they retired it).

I also have mixed opinions on whether the module was actually a good idea. It could help users who are blind or visually-impaired by optimizing the content for a better experience. But maybe these users actually find it more disturbing than helpful if you mess around with things like the speed of voice on your website because they are used to their personal settings of the screen reader.

What do you think about the idea and goals of the CSS speech module?
Let’s discuss it in the comments!