• 2 Minutes to read
  • Dark
  • PDF


  • Dark
  • PDF

What is Speech Markdown?

Many smart assistants use an XML-based markup language for speech synthesis. It allows you to customize the way your text-to-speech sounds by letting you to adjust things like pitch, speech rate and volume. 

Every smart assistant vendor implements their own flavour of SSML which makes it difficult to customize your speech for each specific platform.

Speech Markdown was introduced as a platform-independent way of giving you control over the generation of speech. It is similar to text markdown in that it's more appealing to human readers, with the goal of enabling people to write in an easy-to-read and easy-to-write plain text format. 

What are the Benefits?

  • Speech Markdown is simple. Don’t know the difference between prosody and phoneme? No problem. Simpler than Speech Synthesis Markup Language (SSML).
  • Converts to SSML or Text. Just because today’s voice platforms use SSML, doesn’t mean that content authors must. Write content is Speech Markdown and let the tools and frameworks convert it to SSML or plain text.
  • Supports Multiple Platforms. Don’t worry about how each platform supports (or does not support) the SSML specs v1.0 & v1.1. Use separate formatters to convert Speech Markdown to platform-specific SSML flavours such as Amazon Alexa and Google Assistant.

Simple Example

It's important that you (act now)[emphasis] so you don't miss out.

The example shows how we can ask the speech synthesizer to emphasise the words "act now" when reading out the sentence.


A modifier will change the way a specific word or phrase is spoken. The text to be modified is included in parenthesise and the modification rules are supplied as multiple key/value pairs. Not all modifiers have a value, such as the [emphasis] example above. 

The typical format of a modifier is:

  • The value in the standard format Speech Markdown is always quoted. Either single or double quotes can be used.
  • The order of key:"value"; pairs is not important.
  • Semi-colons are only required when there are two or more modifiers. The trailing semi-colon is optional.

Multiple modifiers can be applied to the same text:

Please (drive carefully)[volume:"loud";rate:"slow"] on icy roads.

If your speech synthesizer is able to play audio files, you can use the [audio] tag.

The second little chicken said (cluck cluck cluck)[audio:"https://www.freesoundslibrary.com/wp-content/uploads/2018/01/chicken-clucking.mp3"] as it walked home. 


A section tag marks the beginning of a section. The section continues until the next section tag is found or the end of the content is reached.

The format of a section is: 



And now for the latest traffic report, let's cross to Mike, our eye in the sky: 

Thanks Geoff. Traffic is light to moderate this afternoon, no accidents to report. Drive safe out there on the icy roads. Back to the studio. 

Thanks Mike, glad to hear things are travelling well.

What's Next