Feature flags done right

December 17, 2024

This article delves into how one can use PostHog feature flags in a fully type-safe way. While PostHog offers much more than just feature flags - web analytics, session recordings, and surveys, to name a few - we won’t be covering those here. Also, to avoid being mainstream at all costs, we’ll use a lesser-known combo: TypeScript and React, as our reference point.

ℹ️ At the time of writing, the tools discussed are at the following versions: posthog-js@1.188.0, typescript@5.7.2, react@18.3.1.

ℹ️ This article is not sponsored by PostHog. However, if the good folks at PostHog would like to sponsor it, my bank details are in the pinned comment 🙃.

Why?

Why would anyone even need feature flags? First, the idea of "on or off" is a false dilemma. There's a lot in between: percentage rollouts, custom rules, variants, and more. Flags can also carry payloads, effectively acting as a headless CMS. And that's just the user-facing component of feature flags. With these capabilities, you can finely tailor your users' experiences to their specific needs (e.g., based on location or device), compare performance, make informed decisions on feature variants, and much more. Another often underrated area where feature flags shine is developer experience. Imagine maintaining long-running feature branches, which become increasingly difficult to merge cleanly with each passing week. By flagging a feature, you can deploy continuously (essentially shipping unfinished features) and flip the switch only when it’s ready to go public. How cool is that?

Alright, but why would anyone need flags to be fully type-safe? Let's zoom in a bit—what does a typical feature flag workflow look like in a real (maybe even enterprise-grade?) application? There are at least two, but likely three or more environments (dev, stage, prod...). What's unconditionally visible in one environment might be hidden or subject to A/B testing in others. To facilitate that, different projects (PostHog calls them projects) correspond to different environments. Every project might be configured differently. Every flag might also be configured differently (e.g., a boolean in dev, or variants with dynamic distribution in prod). A flag present in dev and stage might (accidentally) be missing in prod. Another might have its name misspelled, causing the client-side code to reference a non-existent flag (which PostHog simply reports as disabled). Now, multiply all that by 5, 10, or eventually 100 (5+ years into the project). Are you scared yet? Wouldn't it be lovely to have flag names autocompleted for you, payload types properly inferred, and the compiler provide early feedback when the code references a missing flag? Better yet, imagine having this done independently for every environment, allowing for environment-specific yet still type-safe configurations. For the rest of the article, I'm just going to assume the answer is - yes.

PostHog JS library

Let's first examine the API surface of the posthog-js library. As mentioned earlier, we'll focus exclusively on the React-specific parts. However, all the concepts discussed here would be equally applicable in a React-free context (no pun intended).

The posthog-js/react module provides the following three hooks (it actually offers a bit more, but these are particularly relevant to us):

const enabled = useFeatureFlagEnabled("foo"); // `boolean | undefined`
const payload = useFeatureFlagPayload("foo"); // `JsonType | undefined`
const variant = useFeatureFlagVariantKey("foo"); // `string | boolean | undefined`

While they generally do what their names suggest, they don't necessarily make it easier to ensure the overall soundness of the system. Specifically:

Loosely typed input
Loosely typed output
Why are there so many hooks to begin with?

There's also a component, PostHogFeature, which, aside from some basic event-capturing capabilities (easily implementable on your own), primarily builds on the aforementioned hooks to enable conditional UI rendering and fallbacks. It inherits nearly all the issues described above:

<PostHogFeature flag="foo" match={true}>
  <h1>Hello, World!</h1>
</PostHogFeature>

Is this sufficient? Depending on your situation, it may or may not be. The economic justification for making the API stricter - accepting both the maintenance cost and the learning curve - will vary from project to project. However, if you sense that the adoption of feature flags in your project will only grow, read on; you're in for a treat!

The source of truth

All the promises I made above rely heavily on type inference, which only works when there is a source to infer from. To address this, we need to capture the PostHog feature flag configuration in a data structure from which we can extract insights, such as:

What are the allowed flag keys?
What is the type of a particular flag?
What payload is associated with a specific variant?

As of the time of writing, a PostHog feature flag can come in one of two flavors: boolean or variant. Let's model these using types:

import type { JsonType } from "posthog-js";
 
type BooleanFeature<P extends JsonType> = {
  kind: "boolean";
  payload: P;
};
 
type VariantFeature<V extends Record<string, JsonType>> = {
  kind: "variant";
  variants: V;
};
 
type Features = {
  foo: BooleanFeature<never>;
  bar: VariantFeature<{
    a: never;
    b: {
      PI: 3.14;
    };
  }>;
};

We've now defined a feature flag configuration with the following characteristics:

A feature flag foo exists, of type boolean, with no payload.
A feature flag bar exists, of type variant, with two variants:
- Variant a, with no payload.
- Variant b, with a payload of { "PI": 3.14 }.

We could call it a day and move on, but there's something in the air, isn't there? An all-encompassing sense of fragility. Or, less dramatically - what are the odds that a PostHog configuration change won't be properly reflected in the type system? I'd say that's quite likely - more a matter of when than if.

What if instead we try...

Keeping things in sync

It should be obvious by now, but let me reiterate: things will likely go wrong. The more complex your PostHog configuration is, the sooner it will get out of sync with its mirror, which rests peacefully in Git. Let's now focus on preventing that.

Luckily, PostHog provides all the information we need through its API. Why not use it to generate code from which we can infer the types we previously hardcoded? Ideally, the generated code would reside in a location excluded from version control, as it would not only change frequently but also vary across different environments.

A few things to keep in mind:

You'll need an API key with permissions to access the feature flag configuration for a specific environment.
The API returns feature payloads as JSON strings. While this makes sense, it doesn't make inferring payload types any easier. As mentioned, PostHog ensures these payloads are valid JSON, which means we can simply parse them rather than infer types from a string (though, admittedly, that could be a fun exercise, wouldn't it?).

To keep this article concise, here's a repository containing an example sync script. If everything works as expected, you'll end up with the following data structure (with some fields omitted for brevity):

export const features = {
  results: [
    {
      key: "foo",
      filters: {
        payloads: {},
        multivariate: null,
      },
    },
    {
      key: "bar",
      filters: {
        payloads: {
          b: { PI: 3.14 },
        },
        multivariate: {
          variants: [{ key: "a" }, { key: "b" }],
        },
      },
    },
  ],
} as const;

Let's now do something cool with it.

The source of truth, revisited

Enough with hardcoding stuff - no more mirroring PostHog config; it's inference time. The goal is to end up with the exact same type:

type BooleanFeatureWithoutPayloadRaw = {
  filters: {
    multivariate: null;
    payloads: Record<string, never>;
  };
};
 
type BooleanFeatureWithPayloadRaw<P extends JsonType> = {
  filters: {
    multivariate: null;
    payloads: {
      true: P;
    };
  };
};
 
type VariantFeatureRaw<V extends string, PK extends V> = {
  filters: {
    multivariate: {
      variants: ReadonlyArray<{
        key: V;
      }>;
    };
    payloads: Record<PK, JsonType>;
  };
};
 
type FeaturesRaw = typeof featuresRaw;
 
type Features = {
  [R in FeaturesRaw["results"][number] as R["key"]]: R extends BooleanFeatureWithoutPayloadRaw
    ? BooleanFeature<never>
    : R extends BooleanFeatureWithPayloadRaw<infer P>
    ? BooleanFeature<P>
    : R extends VariantFeatureRaw<infer V, infer PK>
    ? VariantFeature<{
        [K in V]: K extends PK ? R["filters"]["payloads"][K] : never;
      }>
    : never;
};

Mission accomplished. Now is a good time to revisit the posthog-js library and determine the best way to expose that data to the consumer.

The API surface

When it comes to the API surface, two factors in particular make building a type-safe PostHog experience challenging:

Fragmentation. The API consists of several hooks that operate in complete isolation, with no awareness of one another. Gathering full information about a flag can require up to three hook calls. Even then, it falls on the caller to stitch the data together in a meaningful way.
Asynchrony. Since retrieving feature flag information is asynchronous, there's a brief window between the API call and the client receiving a response during which the flag is considered unresolved, which the hooks represent as undefined.

Let's try to fix that by inventing a new type - one that captures the asynchronous nature of feature flags in an unambiguous way, consolidates all the information available for a given feature flag, and builds on the data we've extracted from PostHog. Now, we need a fancy name. How about FeatureResult?

type FeatureResultResolving = {
  type: "resolving";
};
 
type FeatureResultResolved<T extends keyof Features> = {
  type: "resolved";
} & (
  | ({
      enabled: true;
    } & (Features[T] extends infer F
      ? F extends BooleanFeature<infer P>
        ? {
            variant: true;
            payload: P;
          }
        : F extends VariantFeature<infer V>
        ? {
            [K in keyof V]: {
              variant: K;
              payload: V[K];
            };
          }[keyof V]
        : never
      : never))
  | {
      enabled: false;
      payload: undefined;
      variant: undefined;
    }
);
 
type FeatureResult<T extends keyof Features> =
  | FeatureResultResolving
  | FeatureResultResolved<T>;

That's a lot of code - let's see if it even works.

declare const fooTest: FeatureResult<"foo">;
declare const barTest: FeatureResult<"bar">;
 
try {
  fooTest.type === "resolved" && fooTest.enabled && fooTest.payload; // `never`
 
  barTest.type === "resolved" &&
    barTest.enabled &&
    barTest.variant === "b" &&
    barTest.payload.PI; // `3.14`
} catch {
  // 😱
}

Absolutely. We can now properly narrow down the variant and payload by pattern-matching on the FeatureResult type. Time to put that superpower to the test in battle.

The hook

Yes, you heard that right - the hook, not three. We've already established that consolidation works wonders for type safety, and now, with the introduction of the FeatureResult type, it makes even more sense to consolidate. Shall we?

const useFeature = <T extends keyof Features>(feature: T): FeatureResult<T> => {
  const enabled = useFeatureFlagEnabled(feature);
  const payload = useFeatureFlagPayload(feature);
  const variant = useFeatureFlagVariantKey(feature);
 
  return useMemo(() => {
    if (typeof enabled !== "boolean") return { type: "resolving" };
 
    return {
      type: "resolved",
      enabled,
      payload,
      variant,
    } as any;
  }, [enabled, payload, variant]);
};

Unlike the original hooks, this one can be trusted. Let's build on top of it.

The component

There's something odd about how PostHog represents variants across different flag types, and this oddness carries over to the PostHogFeature component. Specifically, even boolean flags are treated as having a single variant with the key true, requiring you to pass that key as a prop. This feels like an artifact of a variant-oriented design rather than something natural. One of our goals below is to address this.

This time, we'll stick to the basics and start with a set of requirements for the new component:

It must support all flag types.
It should only require additional input (e.g., a variant) for flags that need it.
It should allow specifying fallback UI for cases when the feature flag is still being resolved or when no match is found (e.g., the flag is disabled or a different variant is active).
It should provide access to the flag payload.

Since we need to make decisions based on the flag type, let's group them accordingly:

type BooleanFeatures = KeysWithType<Features, BooleanFeature<any>>; // `'foo'`
type VariantFeatures = KeysWithType<Features, VariantFeature<any>>; // `'bar'`

Now, the meat. Since we're after a generic component with polymorphic behavior, let's use function overloads:

type IfBooleanFeatureProps<T extends BooleanFeatures> = {
  feature: T;
  resolving?: ReactNode;
  otherwise?: ReactNode;
  children: ReactNode | ((payload: Features[T]["payload"]) => ReactNode);
};
 
type IfVariantFeatureProps<
  T extends VariantFeatures,
  V extends keyof Features[T]["variants"]
> = {
  feature: T;
  variant: V;
  resolving?: ReactNode;
  otherwise?: ReactNode;
  children: ReactNode | ((payload: Features[T]["variants"][V]) => JSX.Element);
};
 
type IfFeatureProps = {
  feature: string;
  variant?: string | true;
  resolving?: ReactNode;
  otherwise?: ReactNode;
  children: ReactNode | ((payload: JsonType) => ReactNode);
};
 
function IfFeature<T extends BooleanFeatures>(
  props: IfBooleanFeatureProps<T>
): ReactElement | null;
function IfFeature<
  T extends VariantFeatures,
  V extends keyof Features[T]["variants"]
>(props: IfVariantFeatureProps<T, V>): ReactElement | null;
function IfFeature({
  feature,
  variant = true,
  resolving = null,
  otherwise = null,
  children,
}: IfFeatureProps): ReactElement | null {
  const result = useFeature<any>(feature);
 
  if (result.type === "resolving") return <Fragment>{resolving}</Fragment>;
 
  if (!result.enabled || result.variant !== variant)
    return <Fragment>{otherwise}</Fragment>;
 
  return (
    <Fragment>
      {typeof children === "function" ? children(result.payload) : children}
    </Fragment>
  );
}

That's quite a lot of code again. Let's add some examples while cutting out the noise:

<IfFeature feature="foo">🙂</IfFeature>
<IfFeature feature="foo" resolving="🤔" otherwise="🙁">
  🙂
</IfFeature>
<IfFeature feature="bar" variant="a">
  🤷‍♂️
</IfFeature>
<IfFeature feature="bar" variant="b">
  {(payload) => <span>{payload.PI}</span>}
</IfFeature>

Real world scenarios

Let's review several scenarios typical of software projects that use feature flags. All the examples we'll discuss share the following context:

PostHog integration is in place, similar to the one described above.
The project includes a script that synchronizes the feature flag configuration with PostHog. This script can be run manually but also executes automatically during the CI build process.
The project has multiple environments or instances, such as dev, stage, and prod. Each instance is linked to a separate PostHog project. Changes progress through all instances in the same sequence (e.g., dev ➡️ stage ➡️ prod).

Scenario 1: Setting up the project from scratch.

Run the feature flags sync script.
Start the project. If the previous step is skipped, compilation errors will occur, requiring your intervention.

Scenario 2: A non-breaking change is made to the PostHog configuration, e.g., adding a new (not yet used) feature flag.

Run the feature flags sync script. This step is now entirely optional.
Start the project, make your changes, and commit.
The change smoothly progresses through all environments.

Scenario 3: A breaking change is made to the PostHog configuration, e.g., removing a flag or modifying its payload.

Run the feature flags sync script.
Start the project. If the code is incompatible, the compiler will provide feedback, giving you an opportunity to fix it.
If the first step is skipped and you commit an incompatible change, the CI build process will fail, requiring your intervention.

Scenario 4: Significant differences exist between PostHog projects that the code does not yet handle, e.g., a flag is missing in the production configuration.

The change progresses through other environments.
When the change reaches the incompatible environment, the CI build process fails, causing the deployment process to fail as well. This prevents incompatible code from reaching your users.

That's all I have for today. Thanks!

Photo by Marco Bianchetti on Unsplash.