Unicode and UTF-8 in C
By: Ramlak
Starting with GNU glibc 2.2, the type wchar_t is officially intended to be used only for 32-bit ISO 10646 values, independent of the currently used locale. This is signalled to applications by the definition of the __STDC_ISO_10646__ macro as required by ISO C99. The ISO C multi-byte conversion functions (mbsrtowcs(), wcsrtombs(), etc.) are fully implemented in glibc 2.2 or higher and can be used to convert between wchar_t and any locale-dependent multibyte encoding, including UTF-8, ISO 8859-1, etc.
For example, you can write
#include <stdio.h> #include <locale.h> int main() { if (!setlocale(LC_CTYPE, "")) { fprintf(stderr, "Can't set the specified locale! " "Check LANG, LC_CTYPE, LC_ALL.\n"); return 1; } printf("%ls\n", L"Schöne Grüße"); return 0; }
Call this program with the locale setting LANG=de_DE and the output will be in ISO 8859-1. Call it with LANG=de_DE.UTF-8 and the output will be in UTF-8. The %ls format specifier in printf calls wcsrtombs in order to convert the wide character argument string into the locale-dependent multi-byte encoding.
Many of C’s string functions are locale-independent and they just look at zero-terminated byte sequences:
strcpy strncpy strcat strncat strcmp strncmp strdup strchr strrchr strcspn strspn strpbrk strstr strtok
Some of these (e.g. strcpy) can equally be used for single-byte (ISO 8859-1) and multi-byte (UTF-8) encoded character sets, as they need no notion of how many byte long a character is, while others (e.g., strchr) depend on one character being encoded in a single char value and are of less use for UTF-8 (strchr still works fine if you just search for an ASCII character in a UTF-8 string).
Other C functions are locale dependent and work in UTF-8 locales just as well:
strcoll strxfrm
Archived Comments
Comment on this tutorial
- Data Science
- Android
- AJAX
- ASP.net
- C
- C++
- C#
- Cocoa
- Cloud Computing
- HTML5
- Java
- Javascript
- JSF
- JSP
- J2ME
- Java Beans
- EJB
- JDBC
- Linux
- Mac OS X
- iPhone
- MySQL
- Office 365
- Perl
- PHP
- Python
- Ruby
- VB.net
- Hibernate
- Struts
- SAP
- Trends
- Tech Reviews
- WebServices
- XML
- Certification
- Interview
categories
Related Tutorials
Sum of the elements of an array in C
Printing a simple histogram in C
Find square and square root for a given number in C
Simple arithmetic calculations in C
Passing double value to a function in C
Passing pointer to a function in C
Infix to Prefix And Postfix in C
while, do while and for loops in C