How to have a successful data integration with the Faraday platform
Table of ContentsFaraday has always said the more data, the better, no matter what shape it's in. That being said, there are a few ways to make your integration with us faster, easier, and more effective.
US people in US datacenters
Faraday is a batteries-included service, but those "batteries" (our Faraday Identity Graph, or FIG) only refer to US persons. We match your data against US addresses and emails and phone numbers -- people outside of the US are invisible to us. What's more, in some cases, non-US data is a hot potato - something we don't want to hold. In particular, we do not use European data that is covered by the GDPR - if possible, filter that out before it gets to Faraday (and just in case, we delete anything that we recognize as such). Also important to note - we want to connect to data sources in the United States because transferring data across national lines often has legal ramifications.
Key question: Is all of our data in the United States?
Straight from the source
Faraday wants to be as close as possible to your authoritative database. This might mean a data warehouse (Google BigQuery, AWS Redshift, Snowflake, etc), a cloud database (AWS RDS, Azure SQL Server, etc.), an on-prem database (Microsoft SQL Server, etc.). It might be a next-generation system like Podio or Domo or even an e-commerce platform like Shopify or BigCommerce. This is because we want raw, up-to-date, unfiltered data straight from the source. We might not be able to connect to it directly (firewalls, private networks, etc), but just knowing about it will help us choose the right integration.
Key question: What database could my business not live without?
Every distinct event

Faraday wants to know about every click, signup, purchase, and cancellation. If your customers do something, we want to know when, what, who, and how much. We call these "event streams" and you can recognize them because they just keep growing and growing. There could be thousands of events per customer, stretching out over years, and we want to see each one. If the data isn't in your data warehouse, we can connect to your HubSpot, Klaviyo, MailChimp, Shopify, etc. and pull the event stream straight into our system.
Key question: What row of data gets created every time we make money?
Customer by customer

Faraday wants to know what you know about your customers - email, favorite color, elite status, contact preferences, secondary address. This is different from event streams because there's generally one big record per person, and it's updated over time as new data becomes available. We combine this with event stream data to create a rich model of your customers and predict their behavior.
Key question: If I had to call a customer, where would I look them up?
Not rolled up

Faraday doesn't want to get confused or do a lot of custom work to untangle your data. We don't want to infer event streams from customer data - for example, "it's an order the first time they appear in our files". We can get confused by complicated "rollups" that combine event streams and customer data into weird mixtures - please just give the data to us in 2 separate tables. Complicated "diffing" schemes can result in duplicates or overcounting -- for example, giving us a list of accounts with daily balances and expecting us to do calculations on them. We prefer to see the transactions and sum them up ourselves.
Key question: Is there a reason we can't just provide the raw data?
Uniquely identifiable
Each row should have some kind of unique identifier. For example, an event might have an ID that looks like a huge number. This helps us run our standard suite of analysis on your data, because often that analysis is self-referential, and so we have to be able to tie rows back to themselves.
Key question: Does every table have a primary key?
Simple parts
Faraday wants to make robust, long-lasting integrations with simple parts. A few tables pulled from a database. A daily CSV that is copied into SFTP or AWS S3. Not every integration requires hitting a real-time API thousands of times a second. We have a lot of options, but we are always seeking the simplest solution.
Key question: What is the most boring way to transfer our data?