Logo

Programming-Idioms

This language bar is your friend. Select your favorite languages!
  • Perl

Idiom #231 Test if bytes are a valid UTF-8 string

Set b to true if the byte sequence s consists entirely of valid UTF-8 character code points, false otherwise.

# use utf8 is not required
$b = utf8::is_utf8($s);

(Since Perl 5.8.1) Test whether $s is marked internally as encoded in UTF-8.

Use utf8::valid() to check if a string is either valid bytes or well-formed Perl extended UTF-8 (and not a mix).
using System.Text;
var encoding = new UTF8Encoding(false, true);
bool b;
try
{
    encoding.GetCharCount(s);
    b = true;
}
catch (DecoderFallbackException)
{
    b = false;
}

.NET encodings use replacement fallback by default; exception fallback can be specified using the UTF8Encoding constructor.

New implementation...