Logo

Programming-Idioms

This language bar is your friend. Select your favorite languages!
  • Perl

Idiom #304 Encode string into UTF-8 bytes

Create the array of bytes data by encoding the string s in UTF-8.

use v5.10;
use open ':std', ':encoding(UTF-8)';
use utf8;
use Encode qw(encode);
my $text = 'Café';

my @utf8 = unpack 'C*', Encode::encode 'UTF-8', $text;


Importing utf8 allows UTF-8 in the script; open sets UTF-8 encoding on stdout and stdin.

Encode's encode() function is used to convert the $text variable from a perl internal string to a UTF-8 string, which is then passed to unpack with a 'C*' template and converted into a list of bytes.
use v5.10;
use open ':std', ':encoding(UTF-8)';
use utf8;
my $text = 'Café';

utf8::encode($text);

my @utf8 = unpack('C*', $text);

utf8::encode encodes the text string in situ. unpack using C* extracts each logical character and returns it as a byte into the @utf8 array.

use utf8 enables UTF-8 in the source file

use open sets URF-8 encoding for stdin and stdout.
using System.Text;
byte[] data = Encoding.UTF8.GetBytes(s);

New implementation...