Hi everyone! this is Jimmy , and this is the third article in my series “Breaking Things with Go.” In this series, I document my journey through Jon Bodner’s Second Edition: Learning Go – An Idiomatic Approach to Real-World Go Programming and explore how to use Go in the most practical way I can
in this series the resources are the book itself, go documentation, and any AI model to clarify some things
lets Jump into it

Strings
Strings in Go are immutable UTF-8 byte sequences, not character arrays
- we can calculate the length of string using
len(x) - we can extract single value from a string by using index expression
var s string = "hello there"
var b byte = s[6] //returns byte
var x string = s[6] // compile time error cause indexing returns uint8 unless specified other type
var y string = string(s[6]) //t
var z string = s[6:] //there
- strings are immutable so we won't face problems of modification that slices of slices face
- but we have a problem though
- indexing returns one byte only and UTF-8 code point can be anywhere from one to four bytes long
- so if you deal with different languages or use emoji's care
- this emoji for example 🌞 needs 4 bytes to be stored so when you index or slice part of it only it won't decode correctly because you have to slice or index the whole code point to be decoded correctly
A code point is a number assigned to represent a character in the Unicode standard
for example

and the encoding happens by getting the code point of character then encode it to bytes and with UTF it might be from 1-4 bytes now for the decoding to happen we need all the encoded bytes of the same code point to return the original character
Character ⇄ Code Point (number) ⇄ Encoding (bytes)
the Unicode is the name of the standards that turns char to code point that's why we didn't see a problem with the English letters cause each letter needs a 1 byte to be represented by
for a better understanding
var s string = "Hello 🌞"
fmt.Println(len(s)) // prints 10 not 7
because the slicing and indexing dealt 10 bytes instead of 7 code points
Special Conversion in Go
you can do type conversion between runes, strings, and bytes in go
var a rune = 'x'
var s string = string(a)
var b byte = 'y'
var s2 string = string(b)
but you can't convert int into string
var x int = 65
var y = string(x)
fmt.Println(y) //this Prints A not '65'
and this happens because conversion from int to string yields a string of one rune not a string of digits
- it isn't a rune but it is a string of the rune(code point) result
we can convert string to slice
var s string = "Hello, 🌞"
var bs []byte = []byte(s)
var rs []rune = []rune(s)
fmt.Println(bs) // string converted to UTF-8
fmt.Println(rs) // string convrted to runes
Why is UTF-8 smart ?
we had UTF-32 that used 32 bit to store each code point even if that code point needed 1 byte to be represented then UTF-16 was invented then UTF-8
the good thing about UTF-8 it lets you use single byte to represent the Unicode characters whose values are below 128 which is (all letters, numbers, punctuation) but it can expand to 4 bytes to represent Unicode code point with larger values
How did UTF-8 solve endians problems ?
you can find the complete discussion here, and it is a very good read but the short answer is
The reason is very simple. There are big and little endian versions of UTF-16 and UTF-32 because there are computers with bit and little endian registers. If the endianness of a Unicode file matches the endianness of the processor the character value can be read directly from memory in a single operation. If they do not match, a second conversion step is required to flip the value around.
In contrast the endianness of the processor is irrelevant when reading UTF-8. The program must read the individual bytes and perform a series of tests and bit shifts to get the character value into a register. Having a version where the byte order was reversed would be pointless.
Takes
- Strings in Go are immutable UTF-8 byte sequences
- Indexing and slicing operate on bytes, not code points
- Conversions between strings, bytes, and runes is legal as a special conversion in Go due to their special relationship
- UTF-8 solved a lot of issues
Coming Next
the next break will be about maps and that’s a hell of a topic to break so stick around for next break where we will break more stuff
feel free to reach out to me
