Logo

Programming-Idioms

This language bar is your friend. Select your favorite languages!
  • Groovy

Idiom #231 Test if bytes are a valid UTF-8 string

Set b to true if the byte sequence s consists entirely of valid UTF-8 character code points, false otherwise.

import java.nio.ByteBuffer
import java.nio.charset.CharacterCodingException

import static java.nio.charset.StandardCharsets.UTF_8
final decoder = UTF_8.newDecoder()
final buffer = ByteBuffer.wrap(s)
try {
    decoder.decode(buffer)
    b = true
} catch (CharacterCodingException e) {
    b = false
}

Decoders are not thread-safe.

This is @CompileStatic compatible.
using System.Text;
var encoding = new UTF8Encoding(false, true);
bool b;
try
{
    encoding.GetCharCount(s);
    b = true;
}
catch (DecoderFallbackException)
{
    b = false;
}

.NET encodings use replacement fallback by default; exception fallback can be specified using the UTF8Encoding constructor.

New implementation...