Special Characters in SMS and How They Affect Message Delivery

Overview

Special characters are symbols, letters, or punctuation marks that are not part of the standard GSM-7 alphabet, which is the default character set used for most SMS messages worldwide.

When a message includes one or more special characters, the encoding automatically changes to Unicode (UCS-2), which significantly reduces the number of characters that can fit in a single SMS and can cause the message to be split into multiple parts (segments).

Understanding how special characters affect encoding, segmentation, and billing helps you manage delivery costs and ensure your messages display correctly on all devices.


Why encoding matters

Each SMS has a fixed data size of 140 bytes.
The encoding determines how these bytes are used and how many characters can fit in one message.

Encoding Type Characters per Single SMS Characters per Segment (Concatenated SMS)
GSM-7 160 153
Unicode (UCS-2) 70 67
  • GSM-7 supports the basic Latin alphabet, numbers, and common punctuation.

  • Unicode supports all characters, including emojis and accented letters, but allows fewer characters per message.

If even one unsupported character is used, the message switches from GSM-7 to Unicode, lowering the character limit and increasing segmentation.


Examples

GSM-7 message (standard):

Get ready! Enjoy 25% off your next purchase. Visit storeexample.com today.
  • Encoding: GSM-7

  • Characters: 91

  • Sent as: 1 SMS

Message with special characters:

¡Get ready! Enjoy 25% off your next purchase. Visit storeexample.com today 😊
  • Includes ¡ and 😊, which are not supported in GSM-7

  • Encoding: Unicode (UCS-2)

  • Characters: 91 (but now counted in Unicode)

  • Sent as: 2–3 SMS segments


GSM-7 character set overview

The GSM-7 character set supports uppercase and lowercase Latin letters (A–Z, a–z), numbers (0–9), and common symbols such as @, #, !, &, and +.
Each of these counts as one character toward the 160-character limit.

Some additional symbols are supported through escape sequences, also known as extended GSM-7 characters, which use two bytes and therefore count as two characters each.

Extended Character Description Counts As
^ Circumflex accent 2
{ Left curly bracket 2
} Right curly bracket 2
\ Backslash 2
[ Left square bracket 2
~ Tilde 2
] Right square bracket 2
    Vertical bar
Euro symbol 2

For the complete list of GSM-7 characters, including their hexadecimal values and ISO-8859-1 mappings, see our GSM-7 Character Set Reference and Encoding Guide.


Common special characters that trigger Unicode

When any character outside of the GSM-7 set is used, the message automatically switches to Unicode (UCS-2) encoding.
Common examples include accented letters, inverted punctuation, and typographic symbols.

Special Character Description Recommended Replacement
á a with accent a
é e with accent e
í i with accent i
ó o with accent o
ú u with accent u
ü u with diaeresis u
ñ n with tilde n
¡ inverted exclamation mark !
¿ inverted question mark ?
“ ” curly quotes "
‘ ’ curly apostrophes '
° degree symbol o
arrow ->
bullet *

Impact on segmentation and billing

When a message exceeds the character limit for its encoding, it is divided into multiple segments using concatenation.
Each segment is billed as one SMS, even if the recipient sees the message as a single, joined message.

Example 1 — GSM-7 (standard)

  • Message length: 240 characters

  • Encoding: GSM-7

  • Calculation: 240 ÷ 153 = 1.57 → 2 SMS segments

Example 2 — Unicode (special characters)

  • Message length: 240 characters

  • Encoding: Unicode (UCS-2)

  • Calculation: 240 ÷ 67 = 3.58 → 4 SMS segments

Example 3 — GSM-7 with extended characters

If your message includes extended characters such as { or , each one counts as two characters.
You must include them in your calculation to estimate how many segments (and SMS charges) will apply.


How to check message encoding

Before sending, review your message in the message composer or preview window.
Most SMS platforms (including Messangi) show the following details:

  • Encoding type (GSM-7 or Unicode)

  • Total number of characters

  • Total number of SMS segments

If your message unexpectedly switches to Unicode, check for:

  • Accented letters (é, ñ, ü)

  • Smart quotes or apostrophes copied from Word or email (“ ” or ‘ ’)

  • Emojis or non-Latin characters


Best practices

  • Use GSM-7 characters whenever possible to keep messages within one segment.

  • Avoid accented or non-Latin characters to prevent automatic Unicode encoding.

  • Be aware of extended characters (^, {, }, , etc.) that count as two.

  • Shorten URLs to reduce total character length.

  • Test your messages before sending campaigns to verify encoding and segment count.


Summary

Special characters can impact how your SMS messages are encoded and billed.
Including unsupported or extended characters may:

  • Trigger Unicode encoding

  • Reduce the number of characters per message

  • Cause segmentation and increase cost

By following encoding best practices and referencing the GSM-7 Character Set Reference and Encoding Guide, you can ensure your messages are delivered efficiently, displayed correctly, and sent at the lowest possible cost.

Was this article helpful?
0 out of 0 found this helpful