passit: a password generation toolkit for Go

A few years ago I started working on a password manager, I’d grown tired of the limitations of KeePass and decided I’d make my own. As you might be able to guess I still haven’t actually finished it.

But finished or not, why should all that effort go unappreciated? In particular the password generation code seemed like a good candidate to split out into it’s own module and release. Today I’m doing just that.

passit is a password generation toolkit for Go. It features a variety of different password generators from charsets to regular expressions, wordlists and emoji.

Commands

Let’s start by looking at the two CLI commands it includes:

passphrase

passphrase is a tool that generates random passphrases using either Sam Schlinkert’s ‘1Password Replacement List’, the EFF Large Wordlist for Passphrases, the EFF Short Wordlist for Passphrases #1, or the EFF Short Wordlist for Passphrases #2.

$ go install go.tmthrgd.dev/passit/cmd/passphrase@latest
$ passphrase -n 5 -s -
rescind-phosphor-impedes-sitting-stripe

It takes three command line arguments: -l for the wordlist to use, -n for the number of words to generate and -s for the separator to use. Checkout the help text (passphrase -h) for more details.

twoproblems

twoproblems^[1] is a tool that generates random passwords based on a regular expression template.

$ go install go.tmthrgd.dev/passit/cmd/twoproblems@latest
$ twoproblems '[[:alpha:]]{15}-[[:digit:]]{3}[[:punct:]]{2}'
rLAEdkzBFvhgpBM-858!~
$ twoproblems '[[:alpha:][:digit:]]{5}-(?P<word>/5/-)-[[:punct:]]{5}'
C6tIs-fried-outside-vivid-rhubarb-filming-*`>]?

Two special captures (i.e. (?P<name>)) are supported in the pattern: (?P<word>) for a random word and (?P<emoji>) for a random emoji. Checkout the README for more details.

Package

What if the commands above aren’t enough or you want to generate passwords programmatically? Well then the go.tmthrgd.dev/passit package is where you should look. It contains a number of different password generators that you can use and combine to your hearts content.

All the password generators implement the following interface:

// Generator is an interface for generating passwords.
type Generator interface {
	// Password returns a randomly generated password using r as the source of
	// randomness.
	//
	// The returned password may or may not be deterministic with respect to r.
	// All generators in this package are deterministic unless otherwise noted.
	//
	// The output of r should be indistinguishable from a random string of the
	// same length. This is a property of a good CSPRNG. Fundamentally the
	// strength of the generated password is only as good as the provided source
	// of randomness.
	//
	// r should implement the io.ByteReader interface for improved performance.
	Password(r io.Reader) (string, error)
}

It’s pretty simple. You have a Generator and you call Password with your source of randomness.

Generators

So what about those generators?

The package provides a number of generators that produce output from a fixed set:

Generator	Description	Examples
`Digit`	`[0-9]`	`"0" "7"`
`LatinLower`	`[a-z]`	`"a" "j"`
`LatinLowerDigit`	`[a-z0-9]`	`"a" "j" "0" "7"`
`LatinUpper`	`[A-Z]`	`"A" "J"`
`LatinUpperDigit`	`[A-Z0-9]`	`"A" "J" "0" "7"`
`LatinMixed`	`[a-zA-Z]`	`"a" "j" "A" "J"`
`LatinMixedDigit`	`[a-zA-Z0-9]`	`"a" "j" "A" "J" "0" "7"`
`STS10Wordlist`	A word from Sam Schlinkert’s ‘1Password Replacement List’	`"aback" "loophole"`
`EFFLargeWordlist`	A word from the EFF Large Wordlist for Passphrases	`"abacus" "partition"`
`EFFShortWordlist1`	A word from the EFF Short Wordlist for Passphrases #1	`"acid" "match"`
`EFFShortWordlist2`	A word from the EFF Short Wordlist for Passphrases #2	`"aardvark" "jaywalker"`
`Emoji13`	A Unicode 13.0 fully-qualified emoji	`"⌚" "🕸️" "🧎🏾‍♀️"`
`HexLower`	Lowercase hexadecimal encoding	`"66e94bd4ef8a2c3b"`
`HexUpper`	Uppercase hexadecimal encoding	`"66E94BD4EF8A2C3B"`
`Base32`	Base32 standard encoding	`"M3UUXVHPRIWDW"`
`Base32Hex`	Base32 hexadecimal encoding	`"CRKKNL7FH8M3M"`
`Base64`	Base64 standard encoding	`"ZulL1O+KLDs"`
`Base64URL`	Base64 URL encoding	`"ZulL1O-KLDs"`
`Ascii85`	Ascii85 encoding	`"B'Dt<mtrYX"`
`SpectreMaximum`	The Spectre maximum template	`"i7,o%yC4&fmQ1r*qfcWq"`
`SpectreLong`	The Spectre long template	`"ZikzXuwuHeve1("`
`SpectreMedium`	The Spectre medium template	`"Zik2~Puh"`
`SpectreBasic`	The Spectre basic template	`"izJ24tHJ"`
`SpectreShort`	The Spectre short template	`"His8"`
`SpectrePIN`	The Spectre PIN template	`"0778"`
`SpectreName`	The Spectre name template	`"hiskixuwu"`
`SpectrePhrase`	The Spectre phrase template	`"zi kixpu hoy vezamcu"`
`Empty`	Empty string	`""`
`Hyphen`	ASCII hyphen-minus	`"-"`
`Space`	ASCII space	`" "`

The package also provides a number of generators that produce output based on user input:

Generator	Description
`String`	A fixed string
`RegexpParser`	Password that matches a regular expression pattern
`FromCharset`	A rune from a charset string
`FromRangeTable`	A rune from a `unicode.RangeTable`
`FromSlice`	A string from a slice of strings

There are also a number of ‘helper’ generators that interact with the output of other generators:

Generator	Description
`Alternate`	Select a generator at random
`Join`	Concatenate the output of multiple generators
`Repeat`	Invoke a generator multiple times and concatenate the output
`RandomRepeat`	Invoke a generator a random number of times and concatenate the output
`RejectionSample`	Continually invoke a generator until the output passes a test

Most generators only generate a single of something, be it a rune, an ASCII character or a word. For generating longer passwords you can use Repeat or RandomRepeat, possibly with Join or Alternate. In this way the various generators can be composed to generator arbitrarily long and complex passwords, or short and simple passwords as is needed.

A variant of the Spectre encoding algorithm (formerly known as Master Password) is implemented as a nod back to the mpw-js project I worked on many years ago. The difference from the official v1+ algorithm is the replacement of biased modulo operations (x[r.ReadByte() % len(x)]) with an unbiased readIntN function call (x[readIntN(r, len(x))]). (See § Bias).

Unless otherwise noted, the output of the various generators is now locked in and won’t change.

Regular expressions

One of the cooler password generators is RegexpParser / ParseRegexp. It parses a regular expression pattern with regexp/syntax.Parse and then generates a password that matches that regular expression. (twoproblems exposes this generator in a convenient command line tool).

How does it work?

To start with regexp/syntax.Parse produces a tree of regexp/syntax.Regexp nodes, each one representing an operation from the regular expression syntax tree. We then walk over that creating small generators that produce an output that matches that particular operation.

For instance, the character class [a-z] is mapped to a generator that produces a single character from lowercase a to lowercase z. The repetition operator z{5,10} is mapped to a generator that repeats the inner generator (just a literal z in this case) between five and ten times.

It also supports case folding, with (?i:[a-z]) becoming [A-Za-zſK] which is then constrained to printable ASCII characters and becomes [A-Za-z]. (See SetAnyRangeTable for how to change this behaviour). The literal string (?i:123test123) becomes the pattern 123[Tt][Ee][Ss][Tt]123.

All regular expression features supported by regexp/syntax are supported, though some may have no effect on the output. One difference is that named captures (?P<name>) refer to special capture factories that can be added to the parser with SetSpecialCapture. These special captures invoke caller provided factories that return a Generator. This allows for things like including wordlist support from within the regular expression pattern. (See twoproblems above).

This was inspired by zach-klippenstein/goregen by Zachary Klippenstein.

Wordlists

Four separate wordlists (plus emoji) are included and embedded in the package. These are:

Sam Schlinkert’s ‘1Password Replacement List’ [18,208 words, 14.152 bits/word],
EFF Large Wordlist for Passphrases [7,776 words, 12.925 bits/word],
EFF Short Wordlist for Passphrases #1 [1,296 words, 10.340 bits/word], and
EFF Short Wordlist for Passphrases #2 [1,296 words, 10.340 bits/word].

Sam Schlinkert’s ‘1Password Replacement List’ was, as you might imagine, designed with the hope it would replace the wordlist 1Password currently uses. Whether or not AgileBits ever take Sam up on that offer, we can benefit from a well thought out wordlist based on the most common words in the English language.

The EFF passphrase wordlists were intended for use with dice-based password generation, being inspired by Arnold Reinhold’s Diceware list, but they work perfectly fine here.

Custom wordlists generators can be created with FromSlice.

Wordlists are typically used for generating passphrases like “correct horse battery staple” and make a useful memorable alternative to random passwords.

Emoji13 is very similar in implementation to a wordlist. It contains a list of all 3,295 fully-qualified emoji in Unicode 13.0. Each time it produces a single emoji at random from the list. As future Unicode versions are released with new emoji and supported by Go, new generators will be added, like Emoji14 for Unicode 14.0.

Bias

How do you turn a stream of bytes (unsigned integers between 0 and 255) into an arbitrary integer between 0 and some arbitrary n?

The simplest way to do this is with the modulo operation: a % n (or in mathematical notation: $a \bmod n$ ). You read an unsigned integer $a \in \mathbb{F}_{2^s}$ from the stream, such that $n \in \mathbb{F}_{2^s}$ and $s \equiv 0 \pmod 8$ ( $s$ may be fixed), and output $v = a \bmod n$ . That leaves you with a number in the range you wanted: $v \in [0,n)$ . But there’s a problem: bias.

Take these two graphs below. We’re reading from an infinite stream that returns incrementing bytes, i.e. $\left(a_i\right)_{i \in \mathbb{N}_0} = \left(i \bmod 2^8\right)_{i \in \mathbb{N}_0} = (0, 1, 2, \dotsc, 253, 254, 255, 0, 1, 2, \dotsc)$ , and mapping those values to a smaller range, the character range [a-z] (i.e. $n = 26$ ) in the first case and [0-9A-Za-z] (i.e. $n = 62$ ) in the second.

For each graph we’re reading $\mathrm{lcm}(n, 256)$ characters and seeing how common each generated character is. For both cases $\mathrm{lcm}(n, 256) = n \cdot 128$ so we should see each character being generated exactly 128 times.

It should be immediately obvious what the problem is.

So why does this happen? Well let’s take a look at a graph of $a_i \bmod n$ :

We’ve zoomed into the section around $a_i = 255$ and you can see that the progression of $a_i \bmod n$ from $0$ to $n - 1$ doesn’t get to complete before $a_i = 0$ again and the sequence resets. This is why we see certain characters over-represented, they happen to correspond to the lower values that we get in this final section. While the characters we see under-represented are those that happen to correspond to the higher values we miss out on.

There are many different ways to get rid of this bias. One of the simplest way is to always read a value of $a$ that is far bigger than $n$ and then ignore the bias. This works pretty well, but still has a very small amount of bias and requires reading more bytes from the underlying io.Reader.

The approach we take is a combination of rejection sampling and the modulo operation. If we read a value of $a$ that’s in the red zone on the graph above, we reject it and read a new value until we find one outside the biased range. Once we have our value $a$ , we calculate $v = a \bmod n$ just as we initially would have. By rejecting values in the biased range, we ensure that our calculated value $v$ won’t exhibit any bias. With a few additional optimisations, we end up with something similar to:

func readIntN(r io.ByteReader, n byte) byte {
	if n == 1 {
		return 0
	}

	a, _ := r.ReadByte()
	if n&(n-1) == 0 { // n is power of two, can mask
		return a & (n - 1)
	}

	// If n does not divide a, to avoid bias we must not use a a that is within
	// math.MaxUint8%n of the top of the range.
	if a > math.MaxUint8-n { // Fast check.
		ceiling := math.MaxUint8 - math.MaxUint8%n
		for a >= ceiling {
			a, _ = r.ReadByte()
		}
	}

	return a % n
}

Sources of randomness

One unique feature of go.tmthrgd.dev/passit is that all generators (unless otherwise noted) are deterministic. You might be wondering what that means. In short it means that for the same io.Reader input stream, the Generator will always generate the same output.

That sounds like a security vulnerability! Are you telling me it will always generate the same password!?

Only when it’s provided exactly the same input.

Ultimately this is a really good thing as it separates the randomness generation from the password encoding step. That allows both to be analysed and understood independently. You can think of this package a bit like a custom Base64 encoder, bytes come in, text comes out, same bytes, same text.

Most callers should be using crypto/rand.Reader which provides a random stream that changes on every read. When using something like this, every password generated will be random and different. crypto/rand.Reader uses the system’s random number generator—which is always the right choice for randomness.

But for some callers, it’s possible to build deterministic password generators using this package. Spectre is an example of this, it uses scrypt and HMAC to produce per-site passwords derived from a master password without using a vault. cloudflare/gokey is another example that uses PBKDF, HKDF and AES-CTR to create passwords and keys derived either from a master password or from a seed file.

Great care must be taken when using deterministic password generation as the generated password is only ever as good as the provided source of randomness.

Examples

So what might it look like to generate a password? Well here are some examples:

package passit_test

import (
	"crypto/rand"
	"fmt"
	"regexp/syntax"

	"go.tmthrgd.dev/passit"
)

func ExampleEFFLargeWordlist() {
	pass, _ := passit.Repeat(passit.EFFLargeWordlist, "-", 4).Password(rand.Reader)
	fmt.Println(pass) // Example output: winner-vertigo-spurs-believed
}

func ExampleLatinMixed() {
	pass, _ := passit.Repeat(passit.LatinMixed, "", 25).Password(rand.Reader)
	fmt.Println(pass) // Example output: YxIShGyLUaRUKYwWTcxDJirMd
}

func ExampleLatinLowerDigit() {
	pass, _ := passit.Repeat(passit.LatinLowerDigit, "", 25).Password(rand.Reader)
	fmt.Println(pass) // Example output: 4rd6x4ix2e8rwqhkqk08smzst
}

func ExampleDigit() {
	pass, _ := passit.Repeat(passit.Digit, "", 4).Password(rand.Reader)
	fmt.Println(pass) // Example output: 2352
}

func ExampleParseRegexp() {
	gen, _ := passit.ParseRegexp("[a-z]{7}[0-9]{3}[!@#$%^&*]{2}", syntax.Perl)
	pass, _ := gen.Password(rand.Reader)
	fmt.Println(pass) // Example output: wboonxd576#^
}

func ExampleSpectreMaximum() {
	pass, _ := passit.SpectreMaximum.Password(rand.Reader)
	fmt.Println(pass) // Example output: R2.%r7#UK60qtJ!2wT23
}

Of course you should be checking the error return values—these are just examples after all.

For performance sensitive applications it’s advisable to ensure that the io.Reader also implements io.ByteReader. This can be done by using bufio.NewReader.

License

Fin

Go forth and generate passwords.

Source of the famous “Now you have two problems” quote ↩︎