Perfuming the primitives
Strings, gotta love them. They're universally available and easy to work with through a bunch of built-in methods and operators in the standard library. That flexibility carries over to the type system:
declare const a: string;
declare const b: string;
declare function foo(arg: string): void;
foo(a);
foo(a.concat(b));
foo(b + 'c');
A variable of type string
, once declared, assigned a value, transformed, and concatenated with another value of type string
, remains a string
, and the assignability to string
still holds.
Everything is fine as long as we're dealing with the actual string
type. Problems start to manifest themselves once we unwittingly narrow the string
type down by using it to model more specific data. Is an email address a string
? Yes. Is string
an email address? Likely not. Can one send an email to string
?
Digression: the word "string" has appeared 13 times so far.
In certain cases, quite often actually, it makes sense to narrow down the string
type to a subset of values that share a common quality, such as those to which one can send emails. What do we often find ourselves doing instead?
type User = {
name: string;
email: string;
};
declare const user: User;
declare function sendEmail(email: string, body: string): void;
sendEmail(user.name, 'Hello!'); // Compiles just fine π€·ββοΈ
Can you spot the problem? This issue arises frequently and is not limited to emails. Let's expand the User
type:
type User = {
id: string;
name: string;
email: string;
};
declare const user: User;
declare function getPostCountByAuthor(userId: string): Promise<number>;
const postCount = await getPostCountByAuthor(user.email); // Compiles just fine π€·ββοΈ
export {};
Rings a bell yet?
We find ourselves in a peculiar situation where the TypeScript compiler willingly accepts any compatible (string
) value, regardless of its semantic validity. In the mid to long term, with the expansion of the codebase, this poses potential issues. And the reason behind this? Yes, you guessed it - structural type system.
π Achievement unlocked: link to Wikipedia
How do we fix this? We've already established that the string
type is too broad; let's try narrowing it down. Let's focus on the id
property. The simplest way out would be to know all the values upfront and narrow the type down by simply enumerating all of them:
type UUID =
| '4ebb4c24-c34e-4919-aba7-9232f74c310d'
| '042013c4-f98d-452d-b735-493174e6f6ab'
| '2bdb6019-18e9-48d8-ac41-6243359fe79a';
type User = {
id: UUID;
name: string;
email: string;
};
Okay, that was a poor joke. Let's get back down to earth and try something serious. How about we alias the string type to indicate that something's going on?
type UUID = string;
type User = {
id: UUID;
name: string;
email: string;
};
We've now of course gained some refactoring capabilities (one can easily track down all the UUID
references), but nothing more than that really. Although aliased, UUID
is still a string
; we've simply run a full circle.
type UUID = string;
declare const id: UUID;
const foo: string = id;
const bar: UUID = 'Hello, World!'; // Compiles just fine π€·ββοΈ
We clearly need a way to encode the "kind" of the value somehow. A value that is of one, broad type, but also carries extra information, a metadata that lets one distinguish it from the others, like a scent (yes - hence the title π). How do other programming languages (other than TypeScript) deal with that?
In Elm we can make use of algebraic data types and simply create one with a single constructor that wraps the String
type, like so: type UUID = UUID String
. That would, of course, require some effort to extract the value from it, but we would achieve our goal:
type UUID = UUID String
getValue (UUID value) = value
main =
UUID "4aad0bf5-a10e-4a2f-af9c-536ee849ee11"
|> getValue
|> Html.text
PureScript takes a similar approach, yet there's a special syntax for cases where one only wants to wrap a single type - the newtype
keyword. One can define a UUID
-flavoured String
type like this: newtype UUID = UUID String
. When it comes to extracting the value from it, we can take a familiar approach involving a helper function:
newtype UUID = UUID String
getValue (UUID value) = value
main :: Effect Unit
main =
UUID "4aad0bf5-a10e-4a2f-af9c-536ee849ee11"
# getValue
# log
alternatively, we can "teach" PureScript how to deal with our newly created type using typeclasses:
newtype UUID = UUID String
instance showUUID :: Show UUID where
show (UUID value) = value
main :: Effect Unit
main =
UUID "4aad0bf5-a10e-4a2f-af9c-536ee849ee11"
# show
# log
As it turns out, the problem is easily solved with functional languages, using beautiful, idiomatic code. The special, narrowed-down types are first-class citizens; furthermore, pattern-matching and typeclasses make working with them a delight. Okay, but what about TypeScript; can we achieve a similar experience? Well, there's both good and bad news.
- Bad: no, we can't.
- Good: we can get pretty close.
What do I mean by "no, we can't"?
While both: "boxing" a value and pattern-matching are feasible in TypeScript (there have even been attempts to bring the latter to the ECMAScript standard), at the time of writing the overhead is considerable, to the point where neither the code is idiomatic nor the developer experience is smooth.
What do I mean by "we can get pretty close"?
The aforementioned overhead arises from the fact that additional information about the type of value makes its way to the runtime, thus needing to be handled there. Additionally, pattern matching, which is unsupported by the standard library, requires extensive runtime validation logic.
Thing is, there are times when we simply don't need all that. Sometimes, we possess information beyond the compiler's knowledge, and that alone suffices for ensuring type safety in the code. It only takes storing that knowledge of ours in the type system, so the TypeScript compiler could make use of it and act accordingly. This is where "branded types" enter the stage.
Consider the following code snippet:
declare const __brand: unique symbol; // 1
type Brand<T, B extends string> = T & { [__brand]: B }; // 2
type Foo = Brand<string, "foo">; // 3
declare const myFoo: Foo; // 4
const testA: Foo = myFoo; // 5
const testB: Foo = 123; // 6
const testC: string = myFoo; // 7
const testD: Foo = ''; // 8 π₯
- We first declare a value that's guaranteed by the type system to be unique. This is just an extra layer of protection against name clashes; in reality, it's not that important, and we could simply skip this step.
Brand
is a helper type that we're gonna use for producing our "special"string
s. It works by intersecting the original type (T
) with an object containing a special property - brand. By usingBrand
, we add an individual touch toT
(remember the scent reference from a few paragraphs back?). That extra information will only live in the type system.- Now, let's create our first special
string
-Foo
, by "branding"string
with'foo'
. - Let's declare a variable of type
Foo
and run a few tests against it. - A value of type
Foo
is of course assignable to any other value of that type. Duh! - An attempt to assign a value of arbitrary type to one of type
Foo
will trigger an error. So far so good. - Here's where it gets interesting.
Foo
may be a "special"string
but it continues to be astring
, thus is still assignable tostring
. If you think about it, it makes a lot of sense - every operation one can perform on astring
will make sense performed on an email address; it's the converse that's likely falsy. - And finally - the magic happens. Although
Foo
is still technically astring
, assigningstring
to a value of typeFoo
will no longer work, which is exactly what we wanted.
Armed with that knowledge, let's revisit our User
example:
type User = {
id: UUID;
name: string;
email: string;
};
declare const user: User;
declare function getPostCountByAuthor(userId: UUID): Promise<number>;
const postCount = await getPostCountByAuthor(user.email); // π₯
export {};
The highlighted line no longer compiles! Now, head over to the source code and replace user.email
with:
- Any other
string
(expected: compilation error) user.id
(expected: compiles with no errors πͺ)
Until now, we've been quite theoretical; let's dive into the practical aspects!
What's a typical lifecycle of a value?
- it is retrieved from a data source (e.g. from an API)
- it is created (e.g. data only lives client-side)
- it is validated (e.g. does my data adhere to the schema?)
- it is serialized (e.g. before sending back to the API)
Does the Brand
workflow cover every single one of the aforementioned stages? Let's see.
We've just received raw user data from the API. The next step is turning it into something more robust. It's gonna take a helper function:
const makeUUID = (candidate: string): UUID => candidate as any;
const userId = makeUUID('29f72bcf-3093-4a20-9137-84319277cb0d'); // UUID
Is this any good? Well, it's good enough:
- We have faith in the API, don't we? If not, the problem lies somewhere else.
- At this point, we're protected against accidentally passing an arbitrary
string
to a piece of code that expectsUUID
; that's a lot already.
A word on the candidate as any
assertion. This is the exact moment (mentioned earlier) when we know more than the compiler; we're giving it a hint, that the value is actually of different type.
Now to the value creation. Suppose we're handling the user lifecycle entirely client-side (this is a contrived example, but that's not relevant). We need a way to generate values of type UUID
out of thin air. Let's modify makeUUID
to handle optional input:
const makeUUID = (candidate?: string): UUID => (candidate ?? crypto.randomUUID()) as any;
const userId = makeUUID(); // UUID
Up to this point we've been certain that the value is in fact of type UUID
- either it has been returned from the API or we've created it ourselves. There will be situations however when we won't be 100% sure, and will need to verify that. This is where type guards come into play:
declare function validator<T>(value: T): T;
const isUUID = (candidate: string): candidate is UUID => {
try {
validator(candidate);
return true;
} catch {
return false;
}
};
declare const someValue: any;
if (isUUID(someValue)) {
someValue; // Inferred as UUID
}
The validator
function (implementation details are irrelevant) will throw an error when given a non-UUID
argument.
Note that, upon calling isUUID(someValue)
, the result is stored in the type system and is accessible within the body of the if
statement. Therefore, the type of someValue
is accurately inferred.
Let's pause for a moment. With the ability to validate UUID
candidates, why not incorporate this validation directly into the makeUUID
function? It would surely be an improvement, as no invalid value would ever slip in (thus misleading the TypeScript compiler).
Let's do this:
const makeUUID = (candidate?: string): UUID =>
(typeof candidate !== "undefined"
? validator(candidate)
: crypto.randomUUID()) as any;
try {
const userId = makeUUID('29f72bcf-3093-4a20-9137-84319277cb0d');
} catch {
// π₯
}
Note that now we also need proper error handling. The validation error is no longer "swallowed", thus we need to think beyond the happy path. How we encode failure is another thing; here the validator simply throws.
The last item on our agenda is value serialization. Since everything we've discussed so far exists solely within the type system (meaning it doesn't transition to runtime), there's no requirement for any runtime transformation to revert to the original value. Additionally, considering that the Brand
type only affects assignability in one direction (a "branded" string
remains a string
), we can even skip the type-level transformation:
declare function doSomething(value: string): void;
declare const userId: UUID;
doSomething(userId); // π₯
That's all, folks! It doesn't take much to transition from the string
type to a more robust alternative. Next time you're tempted to abuse primitives, please consider applying a bit of perfume.
Read more:
Photo by Laura Chouette on Unsplash.