Perfuming the primitives

January 31, 2024

Strings, gotta love them. They're universally available and easy to work with through a bunch of built-in methods and operators in the standard library. That flexibility carries over to the type system:

declare const a: string;
declare const b: string;
declare function foo(arg: string): void;
 
foo(a);
foo(a.concat(b));
foo(b + 'c');

View source

A variable of type string, once declared, assigned a value, transformed, and concatenated with another value of type string, remains a string, and the assignability to string still holds.

Everything is fine as long as we're dealing with the actual string type. Problems start to manifest themselves once we unwittingly narrow the string type down by using it to model more specific data. Is an email address a string? Yes. Is string an email address? Likely not. Can one send an email to string?

Digression: the word "string" has appeared 13 times so far.

In certain cases, quite often actually, it makes sense to narrow down the string type to a subset of values that share a common quality, such as those to which one can send emails. What do we often find ourselves doing instead?

type User = {
  name: string;
  email: string;
};
 
declare const user: User;
 
declare function sendEmail(email: string, body: string): void;
 
sendEmail(user.name, 'Hello!'); // Compiles just fine 🤷‍♂️

View source

Can you spot the problem? This issue arises frequently and is not limited to emails. Let's expand the User type:

type User = {
  id: string;
  name: string;
  email: string;
};
 
declare const user: User;
 
declare function getPostCountByAuthor(userId: string): Promise<number>;
 
const postCount = await getPostCountByAuthor(user.email); // Compiles just fine 🤷‍♂️
 
export {};

View source

Rings a bell yet?

We find ourselves in a peculiar situation where the TypeScript compiler willingly accepts any compatible (string) value, regardless of its semantic validity. In the mid to long term, with the expansion of the codebase, this poses potential issues. And the reason behind this? Yes, you guessed it - structural type system.

🏆 Achievement unlocked: link to Wikipedia

How do we fix this? We've already established that the string type is too broad; let's try narrowing it down. Let's focus on the id property. The simplest way out would be to know all the values upfront and narrow the type down by simply enumerating all of them:

type UUID =
  | '4ebb4c24-c34e-4919-aba7-9232f74c310d'
  | '042013c4-f98d-452d-b735-493174e6f6ab'
  | '2bdb6019-18e9-48d8-ac41-6243359fe79a';
 
type User = {
  id: UUID;
  name: string;
  email: string;
};

View source

Okay, that was a poor joke. Let's get back down to earth and try something serious. How about we alias the string type to indicate that something's going on?

type UUID = string;
 
type User = {
  id: UUID;
  name: string;
  email: string;
};

View source

We've now of course gained some refactoring capabilities (one can easily track down all the UUID references), but nothing more than that really. Although aliased, UUID is still a string; we've simply run a full circle.

type UUID = string;
 
declare const id: UUID;
 
const foo: string = id;
const bar: UUID = 'Hello, World!'; // Compiles just fine 🤷‍♂️

View source

We clearly need a way to encode the "kind" of the value somehow. A value that is of one, broad type, but also carries extra information, a metadata that lets one distinguish it from the others, like a scent (yes - hence the title 🎉). How do other programming languages (other than TypeScript) deal with that?

In Elm we can make use of algebraic data types and simply create one with a single constructor that wraps the String type, like so: type UUID = UUID String. That would, of course, require some effort to extract the value from it, but we would achieve our goal:

type UUID = UUID String
 
getValue (UUID value) = value
 
main =
  UUID "4aad0bf5-a10e-4a2f-af9c-536ee849ee11"
    |> getValue
    |> Html.text

View source

PureScript takes a similar approach, yet there's a special syntax for cases where one only wants to wrap a single type - the newtype keyword. One can define a UUID-flavoured String type like this: newtype UUID = UUID String. When it comes to extracting the value from it, we can take a familiar approach involving a helper function:

newtype UUID = UUID String
 
getValue (UUID value) = value
 
main :: Effect Unit
main =
  UUID "4aad0bf5-a10e-4a2f-af9c-536ee849ee11"
    # getValue
    # log

View source

alternatively, we can "teach" PureScript how to deal with our newly created type using typeclasses:

newtype UUID = UUID String
 
instance showUUID :: Show UUID where
  show (UUID value) = value
 
main :: Effect Unit
main =
  UUID "4aad0bf5-a10e-4a2f-af9c-536ee849ee11"
    # show
    # log

View source

As it turns out, the problem is easily solved with functional languages, using beautiful, idiomatic code. The special, narrowed-down types are first-class citizens; furthermore, pattern-matching and typeclasses make working with them a delight. Okay, but what about TypeScript; can we achieve a similar experience? Well, there's both good and bad news.

Bad: no, we can't.
Good: we can get pretty close.

What do I mean by "no, we can't"?

While both: "boxing" a value and pattern-matching are feasible in TypeScript (there have even been attempts to bring the latter to the ECMAScript standard), at the time of writing the overhead is considerable, to the point where neither the code is idiomatic nor the developer experience is smooth.

What do I mean by "we can get pretty close"?

The aforementioned overhead arises from the fact that additional information about the type of value makes its way to the runtime, thus needing to be handled there. Additionally, pattern matching, which is unsupported by the standard library, requires extensive runtime validation logic.

Thing is, there are times when we simply don't need all that. Sometimes, we possess information beyond the compiler's knowledge, and that alone suffices for ensuring type safety in the code. It only takes storing that knowledge of ours in the type system, so the TypeScript compiler could make use of it and act accordingly. This is where "branded types" enter the stage.

Consider the following code snippet:

declare const __brand: unique symbol; // 1
 
type Brand<T, B extends string> = T & { [__brand]: B }; // 2
 
type Foo = Brand<string, "foo">; // 3
 
declare const myFoo: Foo; // 4
 
const testA: Foo = myFoo; // 5
const testB: Foo = 123; // 6
const testC: string = myFoo; // 7
const testD: Foo = ''; // 8 💥

View source

We first declare a value that's guaranteed by the type system to be unique. This is just an extra layer of protection against name clashes; in reality, it's not that important, and we could simply skip this step.
Brand is a helper type that we're gonna use for producing our "special" strings. It works by intersecting the original type (T) with an object containing a special property - brand. By using Brand, we add an individual touch to T (remember the scent reference from a few paragraphs back?). That extra information will only live in the type system.
Now, let's create our first special string - Foo, by "branding" string with 'foo'.
Let's declare a variable of type Foo and run a few tests against it.
A value of type Foo is of course assignable to any other value of that type. Duh!
An attempt to assign a value of arbitrary type to one of type Foo will trigger an error. So far so good.
Here's where it gets interesting. Foo may be a "special" string but it continues to be a string, thus is still assignable to string. If you think about it, it makes a lot of sense - every operation one can perform on a string will make sense performed on an email address; it's the converse that's likely falsy.
And finally - the magic happens. Although Foo is still technically a string, assigning string to a value of type Foo will no longer work, which is exactly what we wanted.

Armed with that knowledge, let's revisit our User example:

type User = {
  id: UUID;
  name: string;
  email: string;
};
 
declare const user: User;
 
declare function getPostCountByAuthor(userId: UUID): Promise<number>;
 
const postCount = await getPostCountByAuthor(user.email); // 💥
 
export {};

View source

The highlighted line no longer compiles! Now, head over to the source code and replace user.email with:

Any other string (expected: compilation error)
user.id (expected: compiles with no errors 💪)

Until now, we've been quite theoretical; let's dive into the practical aspects!

What's a typical lifecycle of a value?

it is retrieved from a data source (e.g. from an API)
it is created (e.g. data only lives client-side)
it is validated (e.g. does my data adhere to the schema?)
it is serialized (e.g. before sending back to the API)

Does the Brand workflow cover every single one of the aforementioned stages? Let's see.

We've just received raw user data from the API. The next step is turning it into something more robust. It's gonna take a helper function:

const makeUUID = (candidate: string): UUID => candidate as any;
 
const userId = makeUUID('29f72bcf-3093-4a20-9137-84319277cb0d'); // UUID

View source

Is this any good? Well, it's good enough:

We have faith in the API, don't we? If not, the problem lies somewhere else.
At this point, we're protected against accidentally passing an arbitrary string to a piece of code that expects UUID; that's a lot already.

A word on the candidate as any assertion. This is the exact moment (mentioned earlier) when we know more than the compiler; we're giving it a hint, that the value is actually of different type.

Now to the value creation. Suppose we're handling the user lifecycle entirely client-side (this is a contrived example, but that's not relevant). We need a way to generate values of type UUID out of thin air. Let's modify makeUUID to handle optional input:

const makeUUID = (candidate?: string): UUID => (candidate ?? crypto.randomUUID()) as any;
 
const userId = makeUUID(); // UUID

View source

Up to this point we've been certain that the value is in fact of type UUID - either it has been returned from the API or we've created it ourselves. There will be situations however when we won't be 100% sure, and will need to verify that. This is where type guards come into play:

declare function validator<T>(value: T): T;
 
const isUUID = (candidate: string): candidate is UUID => {
  try {
    validator(candidate);
    return true;
  } catch {
    return false;
  }
};
 
declare const someValue: any;
 
if (isUUID(someValue)) {
  someValue; // Inferred as UUID
}

View source

The validator function (implementation details are irrelevant) will throw an error when given a non-UUID argument.

Note that, upon calling isUUID(someValue), the result is stored in the type system and is accessible within the body of the if statement. Therefore, the type of someValue is accurately inferred.

Let's pause for a moment. With the ability to validate UUID candidates, why not incorporate this validation directly into the makeUUID function? It would surely be an improvement, as no invalid value would ever slip in (thus misleading the TypeScript compiler).

Let's do this:

const makeUUID = (candidate?: string): UUID =>
  (typeof candidate !== "undefined"
    ? validator(candidate)
    : crypto.randomUUID()) as any;
 
try {
  const userId = makeUUID('29f72bcf-3093-4a20-9137-84319277cb0d');
} catch {
  // 💥
}

View source

Note that now we also need proper error handling. The validation error is no longer "swallowed", thus we need to think beyond the happy path. How we encode failure is another thing; here the validator simply throws.

The last item on our agenda is value serialization. Since everything we've discussed so far exists solely within the type system (meaning it doesn't transition to runtime), there's no requirement for any runtime transformation to revert to the original value. Additionally, considering that the Brand type only affects assignability in one direction (a "branded" string remains a string), we can even skip the type-level transformation:

declare function doSomething(value: string): void;
 
declare const userId: UUID;
 
doSomething(userId); // 💥

View source

That's all, folks! It doesn't take much to transition from the string type to a more robust alternative. Next time you're tempted to abuse primitives, please consider applying a bit of perfume.

Nominal typing techniques in TypeScript

Photo by Laura Chouette on Unsplash.