twitter linkedin
Andy Hattemer > Posts

Visual Explanation of Segment

Segment (the customer data / analytics tool) can be confusing at first. This page provides a visual map and a collection of recommendations and tips on using Segment.

What is Segment?

Everyone I’ve talked to about Segment started out thinking it is only one of these things:

  • Web Analytics tool
  • Tag Manager
  • Marketing Data Platform
  • API
  • Middleware / ETL Tool
  • Data Warehouse

It takes lots of probing around and flipping of switches in Segment to get that it is actually all of these things and more. To save time, I created an unofficial map of the Segment data empire:

Segment Data MapThis Map was created using Draw.IO - If you’d like to modify or update it, here’s the XML file

My notes below aren’t meant to replace the documentation, but rather supplement it with clarifications, recommendations, and tips.

The Spec

Segment’s data spec is a simple set of rules for formatting customer-centric data. All data sent to Segment via the API must follow these rules, all 3rd party integrations that accept the data are plug-and-play because they know what data to expect (thanks to the spec!) Read about it.

Segment Browser Events

Browser Events

The Segment Tag

Every event from the browser starts with the javascript tag embedded on your site. The tag is actually two things: A web analytics event library, and a simple tag-manager.

Analytics.js Core - Web Analytics Library

The analytics library is the very first thing Segment launched. It is open-source, and at least one competitor has standardized on it. If you’re coming from Google Analytics it has a lot of the same concepts: Page events (GA: pageviews) are fired by default, you can send Track events (GA: events) for things like conversions. Every event includes all the fields you expect to find in web analytics: Referrer, UTM tags, Page Title, URL, Timestamps, etc… Analytics.js is simple, elegant and easy to incorporate into your site.

Analytics.js Tag Integrations - simple Tag Manager

The other role of Analytics.js is to embed and integrate a javascript tag for many of the Segment integrations you turn on. In some cases (e.g. the Quora Conversion Pixel), that is literally all that happens when you turn on the integration. I call these Tag Integrations.

But, for Tag Integrations that are used by many customers, Segment (and sometimes the 3rd party integration) have done a pretty good job of customizing the browser tag integration to sort of “meld” with the Segment Data spec and become much more powerful.

Many Tag Integrations now have surprise extras so it’s worth reading the documentation page on each one you turn on. Not understanding how browser-based integrations work can lead to major issues:

  • “Set-it-and-forget-it” leads to Bloat - Remember to turn off integrations you no longer use. The worst tags can use ~700kb of your user’s bandwidth per page load.

New Segment feature changes this section:

A new “Connection Modes” feature allows you to activate certain integrations without having to embed the integration tag into the browser. This is great for reduced tag bloat.

  • Segment tags run Everything for Everyone, Everywhere - Need to run an integration on a small section of your site, or only for a small audience? You may be better off embedding it via a more configurable Tag Manager like GTM.

  • Watch for data leakage - If you take the YOLO approach and start flipping integrations on in production, watch out! Some analytics integrations will default to recording everything a user types into your site, including billing info and other PII (Personally Identifiable Information.) To avoid leaking data: At the very least, go into the settings page for the integration itself (e.g. FullStory > Settings) and filter out sensitive data, THEN turn on the integration. Ideally, have a separate dev/test project in Segment.

  • Don’t use the GTM Integration - [personal opinion alert!] You’re opening yourself up to difficult debugging issues when you let the Segment Tag Manager load the Google Tag Manager (and the same goes for the inverse). If you really need GTM, go through the effort of embedding it directly into the page.

Segment Mobile Events

Mobile Events - the Mobile SDK

Segment’s mobile analytics libraries are meant to be rolled into your app’s code. Once integrated, basic events like App Opens, Installs and Updates are included by default. Beyond that, there’s an extension of the Segment Spec for common mobile events, in addition to Screens (Mobile equivalent of pageview) and the same Tracks and Identifies, which need to be written into your app’s code.

Mobile seems to be a huge growth area and differentiator for Segment. Working for an infrastructure provider makes me woefully out-of-touch with mobile, if anyone would like to contribute context here let me know.

Segment Server Events

Server Events - Python/Go/Node.js/…

Browser events are the gateway drug, Server events are the heroine.

Server events are extremely powerful, but if you’re using them you’re probably hooked on Segment for life.

Segment Server Events follow the same Spec as Browser and Mobile, they’re just fired from code running on your servers instead of in the client’s browser or device. Only integrate Server events if you can hook them into a single event-stream in your code. If you pepper server events throughout your codebase, you’ll spend too much time finding and chasing down event breakage due to unrelated code changes.

“When do I need Server Events?”

This comes up so often there are two different Segment pages and a sweet flowchart about it.

Reason to use Server Events

Real-world Examples

Data only available on Server-Side

Tracking customers interacting with your API__Tracking customer-events not triggered by customer (e.g. customer suspended, customer’s recurring bill paid.)

100% Data coverage is vital

Triggering important emails based on Segment Events

Sending Sensitive Data (sensitive=you wouldn’t want customer seeing, different from PII.)

Customer profitability, priority, Sales stage

It is especially important to be careful with data organization and formatting when creating server-side events because you tend to have more leeway. (IE it’s easier to send garbage data, and garbage-in leads to garbage-out)

Before a single Event is fired

Think about the ideal structure of the data you want to move around. It’s much easier to change things now than later. Here are a few recommendations:

  1. Read Segment’s Documentation - have I mentioned that yet? It is thorough and opinionated in all the right places, these parts are especially important when starting out:

  2. Read the documentation on Integrations you plan on turning on right away - Some of these integrations will auto-magically ingest certain types of events and properties only. For example, the AdRoll Integration will automatically map any properties named revenue to the adroll_conversion_value field in their tool. This is useful to know before you begin creating events.

  3. Make sure the events are actually needed - It’s easy to get carried away with data, for every type of event you create, ask yourself “how would I ever act on this data?

  4. Differentiate Permanent vs Temporary Data Needs - sometimes an event (data) is needed to answer a question e.g. “How many users interact with our monthly/hourly pricing toggle?”, but once the question is answered, the event can go away. Use a generic “Experiment Viewed” Track event for temporary data needs, and fire it from a tag manager instead of permanently building into your codebase.

  5. Watch out for unique event names, traits or properties - If you are firing Server Events, you’ve likely abstracted the code, e.g.:

    //Unloadhook fires at the end of every request
    def segment_():
      analytics.track(web.session.user_id, web.ctx.path, action.metadata)
    
    app.add_processor(web.unloadhook(segment_unloadhook))
    

    That’s great until someone else comes along and

Segment Ad Data

Ad Data

Ad data (part of what Segment calls “Cloud App” Sources) is an extra bonus: they pull all raw ad platform data (impressions, clicks, cost) from Facebook and Google and dump it into a warehouse of your choice.

Some notes about Ad Data:

  • Batched, not streamed: every 3 hours
  • Ad Data is “Out-of-Spec”, so they can’t send it to your integrations.
  • Ad Data only makes sense to use if you:
  • Are a big company - otherwise just use the Ad/Analytics Tool UI
  • Do your marketing analysis through a BI tool like Looker or Tableau
  • Have a big database - it takes up a lot of space
  • AFAIK it is impossible to create a one-to-one link between AdWords data and Web Analytics data - Aspiring data nerds like myself have tried, Google has the capability to do this using the GCLID UUID’s that you can add to URLs in AdWords, but they are greedy and keep that data hidden.

Segment Integrations

I group Integrations differently than Segment

Segment’s own categorization of Integrations is a source of a lot of new-user confusion. You can filter Integrations by:

  • Categories (A/B Testing, Analytics, Security…)
  • Platforms (Browser, Mobile, Server)
  • Levels (Developer, Project, Growth…?)

But these are really TAGS, (not mutually exclusive categories.) Turning on an integration labeled “Server” could result in a new javascript tag on your website!

Segment’s Integrations Dashboard

So here’s my different way of grouping Integrations:

  • Tag Integrations - super basic, covered in browser section above.
  • One-Way Integrations
  • One-point-five Way Integrations
  • Two-way Integrations

Segment One-Way Integrations

One-Way Integrations

One-way integrations are extremely simple and risk-free to turn on. The breadth of tools in this category demonstrates how valuable a real-time customer data feed can be to many parts of your business, here are some highlights:

  • Niche Analytics - Attribution, Real-Time Metrics dashboards
  • Security - Stream login attempts to Castle and they’ll do account takeover detection and more
  • Notifications - Easily set up a Slack notification when a user submits a Form
  • Machine Learning and Prediction - Pipe everything to MadKudu and they’ll tell you who is a qualified lead
  • Data Transformation Tools - Tray.io allows you to create “If this then that” style workflows that further transform the Segment events in real-time, creating endless possibilities.

Segment One-point-Five-Way Integrations

One-point-five-way Integrations

These all started out as one-way integrations, a tool like Mailchimp was consuming Segment events and firing out emails based on what it received. But then you quickly run into an issue of having two separate places to report on.

Segment Two-Way Integrations

Two-way Integrations

The BEST integrations - two-way streaming communication with Segment. To do this right, tools basically have to be built with Segment in mind, so there’s only three right now. But I bet we will see more cool additions to this group as Segment grows.

Customer.io, Drip and Mailjet

Closing Arguments

Product evolves based on customer/market needs - Segment got their software architecture right. Instead of using an army of engineers to plug holes and bugs (like 1st gen CRM and Data services) they can focus on digesting customer feedback and building solutions. Ad Data (Cloud App Sources) and Enrichment are major power upgrades

Taking the long view -

  • No nickel-and-diming - The 1st generation model of

  • Cautious about ethical tracking - This is a tough one, I think some marketers might disagree with Segment’s approach to tracking.

Growth Loop

Segment’s integration strategy is straight from the Salesforce playbook.