The 70-Character Limit of Unicode Messages: A Deep Dive
Unicode Standard at Connection Software: At Connection Software, our comprehensive support for sending SMS text messages embraces the Unicode character set, which is restricted to 70 characters per SMS. Our state-of-the-art APIs are adept at handling longer messages. When such messages are dispatched, they are sent to the recipient's device as a series of concatenated SMS messages. For a detailed understanding of Unicode, one can refer to resources provided by The Unicode Consortium and their frequently asked questions. Rationale Behind the 70-Character Limitation for Unicode Messages: An SMS in the GSM (Global System for Mobile Communications) format can accommodate a message body of up to 140 octets (8-bit bytes). The default alphabet for GSM is based on a 7-bit structure. This means that one can fit up to 7/140×8 = 160 characters in a singular SMS. However, Unicode messages employ the UCS-2 encoding, which allocates 2 octets for every character. Consequently, this restricts the character count to 2/140 = 70 characters for each SMS.
The Evolution and Impact of Unicode
Introduction to Unicode: Unicode stands as a transformative international character encoding standard. It assigns a distinctive number to each character, irrespective of language or script. This unique approach ensures seamless compatibility across various platforms, applications, and devices. Historical Perspective on Character Encoding: In the pre-Unicode era, there were countless character encodings, each designating numbers to symbols and letters for computer interpretation. This archaic system had an inherent limitation—it couldn't encode a sufficient number of characters to encapsulate all global languages. Even the technical symbols, letters, and punctuation that were universally used couldn't be wholly integrated. The overlapping nature of these encoding systems often led to confusion, where the same number could represent multiple characters or a single character could possess various numerical representations. The consequence? Computers had to accommodate an array of encodings, leading to frequent data corruption when information transitioned between diverse machines or encodings. Realization of the Unicode Vision: Come October 1991, the Unicode Consortium's ambition to replace the discordant encoding methods with a singular, universal standard bore fruit. This marked the release of Unicode Standard version 1.0. Fundamentals of Unicode: At its core, Unicode offers a distinct number for every conceivable character. This spans from punctuations, mathematical symbols, and arrows to non-Latin scripts like Thai, Chinese, or Arabic. Today, thanks to Unicode, data can be seamlessly and reliably transferred across diverse devices, applications, and platforms without any corruption. This character encoding system has become the backbone of modern software, featuring prominent operating systems, web browsers, laptops, smartphones, and almost all aspects of the internet. Guardians of Unicode: The Unicode Consortium, a non-profit entity, shoulders the responsibility of developing and advocating the Unicode Standard. Any alterations to this standard require the dual endorsement of both the Consortium and the international standard ISO/IEC 10646, ensuring character consistency. This standard and ISO/IEC 10646 collectively endorse three encoding modalities: UTF-8, UTF-16, and UTF-32. Together, they share a character repertoire, capable of encoding a staggering million characters. Deciphering Unicode SMS: A "Unicode SMS" is one where the content includes characters outside the GSM-7 character set's purview. Standard SMS can encompass up to 160 characters from the GSM-7 set, comprising Latin characters (A-Z), numerals (0-9), and a handful of special symbols. While Unicode can represent any known character, it consumes more SMS space than GSM's concise 7-bit binary code. As a result, Unicode SMS messages are truncated to 70 characters, and longer messages get segmented.
Understanding Unicode Text
Introduction to Unicode: Unicode isn't just a simple method of numbering characters. It's a globally recognized character encoding standard that has been adopted by major platforms and systems, including Microsoft software. In essence, if you're using modern technology, you're already benefiting from Unicode! Core Concept of Unicode: At its heart, Unicode operates on a fundamental principle: every character, from letters to numbers and special symbols, is represented by a unique number. While this may seem abstract, remember that computers inherently understand numbers. Hence, they represent and store characters—be they alphabets, numerals, or special symbols—by allocating a specific number to each. Decoding Hex/UTF-16 Characters: Decoding Unicode, particularly in formats like UTF8 or UTF-16, can be a technical process. For instance, if you have a string encoded in UTF-16 (often recognizable by its character representation with two leading zeroes), you can treat it similarly to any 2-character hexadecimal string. The process typically involves looping through every set of 4 characters, then using specific decoding methods (like inputBaseN) to retrieve the original character from its code.
The Power of the Unicode Standard
What is Unicode? The Unicode standard is a transformative approach to text representation. It grants the ability to send SMS messages featuring characters from virtually any written language. Connection Software proudly supports this standard in SMS messaging, though there's a constraint of 70 characters per message. However, for more extended communications, EasySendSMS APIs can send them as concatenated SMS messages to maintain continuity. Understanding the Unicode Mechanism: Unlike the traditional 7-bit binary code, Unicode operates with 8-bit "code units", with the capability to combine up to four of these units simultaneously. This multi-unit system boosts the encoding capacity from a mere 128 characters to a staggering 1,112,064 characters. Such extensive capacity ensures virtually all world languages have a place in a single unified character set. Interestingly, Unicode even has code ranges set aside for unique languages like Klingon, although it's worth noting this hasn't received official endorsement from the Unicode Registry. A genius aspect of Unicode is its efficiency. Instead of utilizing four code units for every character, it employs only the necessary units. Take the capital letter "A" as an example; while its full binary representation could be , Unicode optimizes it to  for space efficiency. Without this kind of optimization, a text message's character capacity would drop drastically.
Understanding Unicode SMS
What Exactly is a Unicode SMS? "Unicode SMS" describes text messages that are sent and received, containing characters that don't fall within the default GSM character set. The GSM (Global System for Mobile Communications), initially termed as “Groupe Spécial Mobile”, primarily includes a set of 128 characters. This set encompasses letters (A-Z), numbers (0-9), and various commonly used symbols like @, ?, !, &, and more, which are prevalent in mobile communication. GSM's Limitations and the Rise of Unicode: Originating in Europe, GSM employs a 7-bit binary code. As a result, it has inherent limitations. One of its major drawbacks is the inability to represent characters from alphabets like Cyrillic, or languages such as Chinese, Arabic, and Thai, among others. This 128-character constraint needed a solution for more inclusive communication. Over time, several methods were proposed to address GSM's shortcomings. Among them, the Unicode standard emerged as the most effective solution, enabling the representation of a vast array of global characters and ensuring comprehensive communication across languages and cultures.