Logo

Programming-Idioms

This language bar is your friend. Select your favorite languages!

Idiom #231 Test if bytes are a valid UTF-8 string

Set b to true if the byte sequence s consists entirely of valid UTF-8 character code points, false otherwise.

uses LazUtf8;
b := FindInvalidUTF8Codepoint(s) = -1;
using System.Text;
var encoding = new UTF8Encoding(false, true);
bool b;
try
{
    encoding.GetCharCount(s);
    b = true;
}
catch (DecoderFallbackException)
{
    b = false;
}
import "unicode/utf8"
b := utf8.Valid(s)
import java.nio.ByteBuffer
import java.nio.charset.CharacterCodingException

import static java.nio.charset.StandardCharsets.UTF_8
final decoder = UTF_8.newDecoder()
final buffer = ByteBuffer.wrap(s)
try {
    decoder.decode(buffer)
    b = true
} catch (CharacterCodingException e) {
    b = false
}
# use utf8 is not required
$b = utf8::is_utf8($s);
try:
    s.decode('utf8')
    b = True
except UnicodeError:
    b = False

b = s.force_encoding("UTF-8").valid_encoding?  
let b = std::str::from_utf8(&bytes).is_ok();

New implementation...