X Tutup

uniwidth

package module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 4, 2026 License: MIT Imports: 2 Imported by: 1

README

uniwidth - Modern Unicode Width Calculation for Go

Go Version CI Status Go Report Card codecov Go Reference License Release Stars

uniwidth is a modern, high-performance Unicode width calculation library for Go 1.25+. It provides 3-46x faster width calculation compared to existing solutions through a 4-tier O(1) lookup architecture, SWAR optimization, and a ZWJ-aware emoji state machine.

Performance

Based on comprehensive benchmarks vs go-runewidth:

  • ASCII strings: 15-46x faster (SWAR, 8 bytes/iter)
  • CJK strings: 4-14x faster (O(1) table lookup)
  • Mixed/Emoji strings: 6-8x faster
  • ZWJ emoji: Correct width (👨‍👩‍👧‍👦 = 2, ~95 ns)
  • Zero allocations: 0 B/op, 0 allocs/op for ASCII paths

Run benchmarks yourself: cd bench && go test -bench=. -benchmem

Features

  • 3-46x faster than go-runewidth (proven in benchmarks)
  • All tiers O(1) — 4-tier lookup with 3-stage hierarchical table (3.8KB)
  • ZWJ-aware — family emoji, skin tones, flags handled correctly
  • SWAR optimized — ASCII detection and width counting at 8 bytes/iter
  • Zero allocations for ASCII strings (no GC pressure)
  • Thread-safe (immutable design, no global state)
  • Unicode 16.0 support
  • Modern API (Go 1.25+, functional options pattern)

Installation

go get github.com/unilibs/uniwidth

Requirements: Go 1.25 or later

Usage

Basic Usage
package main

import (
    "fmt"
    "github.com/unilibs/uniwidth"
)

func main() {
    // Calculate width of a string
    width := uniwidth.StringWidth("Hello 世界")
    fmt.Println(width) // Output: 10 (Hello=5, space=1, 世界=4)

    // Calculate width of a single rune
    w := uniwidth.RuneWidth('世')
    fmt.Println(w) // Output: 2

    // ASCII-only strings are super fast!
    width = uniwidth.StringWidth("Hello, World!")
    fmt.Println(width) // Output: 13
}
ZWJ Emoji Sequences
// ZWJ family emoji — correctly returns 2, not 8
width := uniwidth.StringWidth("👨‍👩‍👧‍👦")
fmt.Println(width) // Output: 2

// Skin tone modifiers — correctly returns 2, not 4
width = uniwidth.StringWidth("👍🏽")
fmt.Println(width) // Output: 2

// Rainbow flag
width = uniwidth.StringWidth("🏳️‍🌈")
fmt.Println(width) // Output: 2

// Country flags
width = uniwidth.StringWidth("🇺🇸")
fmt.Println(width) // Output: 2
Options API

Configure handling of ambiguous-width characters:

import "github.com/unilibs/uniwidth"

// East Asian locale (ambiguous characters are wide)
opts := []uniwidth.Option{
    uniwidth.WithEastAsianAmbiguous(uniwidth.EAWide),
}
width := uniwidth.StringWidthWithOptions("±½", opts...)
fmt.Println(width) // Output: 4 (each character is 2 columns)

// Neutral locale (ambiguous characters are narrow) - DEFAULT
opts = []uniwidth.Option{
    uniwidth.WithEastAsianAmbiguous(uniwidth.EANarrow),
}
width = uniwidth.StringWidthWithOptions("±½", opts...)
fmt.Println(width) // Output: 2 (each character is 1 column)
Real-World TUI Examples
// Terminal prompt
prompt := "❯ Enter command: "
width := uniwidth.StringWidth(prompt)
fmt.Printf("Prompt width: %d columns\n", width)

// Table cell padding
text := "Hello 世界"
padding := 20 - uniwidth.StringWidth(text)
fmt.Printf("%s%s\n", text, strings.Repeat(" ", padding))

// Truncate to fit terminal width
func truncate(s string, maxWidth int) string {
    width := 0
    for i, r := range s {
        w := uniwidth.RuneWidth(r)
        if width+w > maxWidth {
            return s[:i] + "…"
        }
        width += w
    }
    return s
}

Architecture

4-Tier O(1) Lookup

uniwidth uses a multi-tier approach where all tiers are O(1):

  1. Tier 1: ASCII Fast Path (O(1))

    • Covers ~95% of typical terminal content
    • SWAR isASCIIOnly() + asciiWidth() process 8 bytes/iter
    • Short strings (< 8 bytes) use fused single-pass loop
  2. Tier 2: Common CJK (O(1))

    • CJK Unified Ideographs, Hangul Syllables, Hiragana/Katakana
    • Simple range checks for 32,000+ characters
  3. Tier 3: Common Emoji (O(1))

    • Emoticons, Pictographs, Dingbats, Symbols
    • Range checks for ~1,200 emoji codepoints
  4. Tier 4: 3-Stage Table (O(1))

    • ROOT[256] → MIDDLE[17×64] → LEAVES[78×32]
    • 2-bit width encoding, 3.8KB total
    • Covers all remaining Unicode codepoints in 3 array lookups
ZWJ State Machine

Forward-scan state machine for correct emoji sequence handling:

  • 3 states: default → emoji → emojiZWJ
  • Handles: ZWJ sequences, skin tone modifiers, variation selectors, flag pairs
  • Inspired by Ghostty's approach, adapted for width calculation
SWAR Optimization

ASCII paths use SIMD Within A Register (SWAR) for high throughput:

  • isASCIIOnly(): uint64 word AND with 0x8080808080808080 mask
  • asciiWidth(): Daniel Lemire's underflow trick for control character detection
  • Both process 8 bytes per iteration with zero allocations

Benchmarks

goos: windows
goarch: amd64

BenchmarkStringWidth_ASCII_Short     ~7 ns/op     0 B/op   0 allocs/op
BenchmarkStringWidth_ASCII_Medium   ~20 ns/op     0 B/op   0 allocs/op
BenchmarkStringWidth_CJK_Short     ~25 ns/op     0 B/op   0 allocs/op
BenchmarkStringWidth_ZWJ_Family    ~95 ns/op     0 B/op   0 allocs/op
BenchmarkStringWidth_EmojiModifier ~40 ns/op     0 B/op   0 allocs/op

Run benchmarks yourself:

go test -bench=. -benchmem

Use Cases

Perfect for:

  • TUI frameworks (terminal rendering hot paths)
  • Terminal emulators (text layout calculations)
  • CLI tools (table alignment, formatting)
  • Text editors (cursor positioning, column calculation)
  • Any high-performance text width calculation

Migration from go-runewidth

uniwidth provides a compatible API for easy migration:

// Before (go-runewidth)
import "github.com/mattn/go-runewidth"
width := runewidth.StringWidth(s)

// After (uniwidth) - drop-in replacement!
import "github.com/unilibs/uniwidth"
width := uniwidth.StringWidth(s)

Performance improvement: 3-46x faster, zero code changes!

Documentation

Testing

# Run tests
go test -v

# Run benchmarks
go test -bench=. -benchmem

# Run with coverage
go test -cover

Current test coverage: 96.4%

Development Status

Current: v0.2.0

This library is stable and production-ready. The API is backward-compatible across minor versions. ZWJ emoji sequences, skin tone modifiers, variation selectors, and flag emoji are all handled correctly.

v0.2.0 Highlights:

  • All 4 lookup tiers are now O(1) (3-stage table replaced binary search)
  • SWAR ASCII optimization (8 bytes/iter)
  • ZWJ emoji state machine (👨‍👩‍👧‍👦 = width 2)
  • Emoji modifier support (👍🏽 = width 2)
  • 96.4% test coverage

Roadmap (v0.3.0+):

  • Profile-Guided Optimization (PGO)
  • Benchmark CI for regression detection
  • Explicit SIMD via Go assembly and archsimd
  • Unicode 17.0 preparation

Contributing

Contributions welcome! This is part of the unilibs organization - modern Unicode libraries for Go.

License

MIT License - see LICENSE file

Built by the Phoenix TUI Framework team.

Part of the unilibs ecosystem:

  • uniwidth - Unicode width calculation (this project)
  • unigrapheme - Grapheme clustering (planned)
  • More Unicode utilities coming soon!

Support


Special Thanks

Professor Ancha Baranova - This project would not have been possible without her invaluable help and support. Her assistance was crucial in bringing uniwidth to life.


Made with care by the Phoenix team | Powered by Go 1.25+

Documentation

Overview

Package uniwidth provides modern Unicode width calculation for Go 1.25+.

uniwidth uses a tiered lookup strategy for optimal performance:

  • Tier 1: ASCII (O(1), ~95% of typical content)
  • Tier 2: Common CJK & Emoji (O(1), ~90% of non-ASCII)
  • Tier 3: Common Emoji (O(1))
  • Tier 4: 3-stage table lookup for all other characters (O(1))

All tiers are O(1) with zero allocations for single-rune lookups. This approach is 3-46x faster than traditional binary-search-only methods like go-runewidth, while maintaining full Unicode 16.0 compliance.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func RuneWidth

func RuneWidth(r rune) int

RuneWidth returns the visual width of a rune in monospace terminals.

Returns:

  • 0 for control characters, zero-width joiners, combining marks
  • 1 for most characters (ASCII, Latin, Cyrillic, etc.)
  • 2 for wide characters (CJK, Emoji, etc.)

This function uses a tiered lookup strategy:

  • O(1) for ASCII (most common case)
  • O(1) for common CJK and emoji (hot paths)
  • O(1) for all other characters (3-stage table lookup)

func RuneWidthWithOptions

func RuneWidthWithOptions(r rune, opts ...Option) int

RuneWidthWithOptions returns the visual width of a rune with custom options.

This function applies the same tiered lookup strategy as RuneWidth, but allows customization of ambiguous character handling and emoji presentation.

Example:

// East Asian locale (ambiguous characters are wide)
width := uniwidth.RuneWidthWithOptions('±', uniwidth.WithEastAsianAmbiguous(uniwidth.EAWide))
// width = 2

// Neutral locale (ambiguous characters are narrow)
width := uniwidth.RuneWidthWithOptions('±', uniwidth.WithEastAsianAmbiguous(uniwidth.EANarrow))
// width = 1

func StringWidth

func StringWidth(s string) int

StringWidth calculates the visual width of a string in monospace terminals.

This function provides a fast path for ASCII-only strings, and uses a state machine for correct handling of multi-rune sequences.

Special handling:

  • ZWJ emoji sequences (👨‍👩‍👧‍👦) are treated as width 2, not the sum of parts
  • Emoji modifier sequences (👍🏽) are treated as width 2
  • Variation selectors (U+FE0E/U+FE0F) modify the width of the preceding character
  • Regional indicator pairs (flags) are counted as width 2, not 4

func StringWidthWithOptions

func StringWidthWithOptions(s string, opts ...Option) int

StringWidthWithOptions calculates the visual width of a string with custom options.

This function applies the same fast paths as StringWidth, but allows customization of ambiguous character handling and emoji presentation.

Example:

// East Asian locale (ambiguous characters are wide)
opts := []uniwidth.Option{
    uniwidth.WithEastAsianAmbiguous(uniwidth.EAWide),
}
width := uniwidth.StringWidthWithOptions("Hello ±½", opts...)
// width = 10 (Hello=5, space=1, ±=2, ½=2)

// Neutral locale (ambiguous characters are narrow)
opts := []uniwidth.Option{
    uniwidth.WithEastAsianAmbiguous(uniwidth.EANarrow),
}
width := uniwidth.StringWidthWithOptions("Hello ±½", opts...)
// width = 8 (Hello=5, space=1, ±=1, ½=1)

Types

type EAWidth

type EAWidth int

EAWidth represents the width for East Asian Ambiguous characters.

const (
	// EANarrow treats ambiguous characters as narrow (width 1).
	// This is the default for non-East Asian locales.
	EANarrow EAWidth = 1

	// EAWide treats ambiguous characters as wide (width 2).
	// This is appropriate for East Asian (CJK) locales.
	EAWide EAWidth = 2
)

type Option

type Option func(*Options)

Option is a functional option for configuring Unicode width calculation.

func WithEastAsianAmbiguous

func WithEastAsianAmbiguous(width EAWidth) Option

WithEastAsianAmbiguous sets the width for East Asian Ambiguous characters.

Example:

// Treat ambiguous characters as wide (East Asian locale)
width := uniwidth.StringWidthWithOptions("±½", uniwidth.WithEastAsianAmbiguous(uniwidth.EAWide))
// width = 4 (each character is 2 columns wide)

// Treat ambiguous characters as narrow (neutral locale)
width := uniwidth.StringWidthWithOptions("±½", uniwidth.WithEastAsianAmbiguous(uniwidth.EANarrow))
// width = 2 (each character is 1 column wide)

func WithEmojiPresentation

func WithEmojiPresentation(emoji bool) Option

WithEmojiPresentation sets whether emoji should be rendered as emoji (wide) or text (narrow).

Example:

// Emoji as emoji (wide, width 2) - default
width := uniwidth.StringWidthWithOptions("😀", uniwidth.WithEmojiPresentation(true))
// width = 2

// Emoji as text (narrow, width 1)
width := uniwidth.StringWidthWithOptions("😀", uniwidth.WithEmojiPresentation(false))
// width = 1

Note: This primarily affects emoji that have both text and emoji presentation variants. Most emoji are always rendered as wide regardless of this setting.

type Options

type Options struct {
	// EastAsianAmbiguous specifies how to handle ambiguous-width characters.
	// Default: EANarrow (width 1)
	EastAsianAmbiguous EAWidth

	// EmojiPresentation specifies whether emoji should be rendered as emoji (width 2)
	// or text (width 1). When true, emoji are treated as width 2.
	// Default: true (emoji presentation)
	EmojiPresentation bool
}

Options configures Unicode width calculation behavior.

Use the functional options pattern to create customized configurations:

opts := []uniwidth.Option{
    uniwidth.WithEastAsianAmbiguous(uniwidth.EAWide),
    uniwidth.WithEmojiPresentation(true),
}
width := uniwidth.StringWidthWithOptions("Hello 世界", opts...)

Directories

Path Synopsis
cmd
generate-tables command
generate-tables generates Unicode width tables from official Unicode 16.0 data.
generate-tables generates Unicode width tables from official Unicode 16.0 data.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL
X Tutup