MadKudu Docs

Home

Flat files with Amazon S3

If you use an internal database for your product usage or CRM, you can send us customer data easily using Amazon S3 and flat files.

1. Content

To get started, we need two types of data feeds:

  • Identify: who is the user?

  • Track: what are they doing?

A third data feed is optional:

  • Group: what accounts do my users belong to?

1.1. Track

The following properties are required for the track file:

  • event_text: the action taken by the user.
    Example: “signup”, “login”, “invited a friend”

  • event_timestamp: the time at which the event happened, in Unix time
    Example: “1436172703”

  • contact_key: the unique identifier of the user who performed the action. This needs to be the same as the contact_key field in the identify file.
    Example: “abc123”, “paul@madkudu.com”

In addition to those required columns, you can add any attributes of your events you would like us to use.

{"event_text":"signed up", "event_timestamp":1234567890, "contact_key":"abc1234"}
{"event_text":"added a friend", "event_timestamp":1234567890, "contact_key":"paul@madkudu.com", "some_other_event_field":"some_value"}

1.2 Identify

The following properties are required for the identify file:​

  • contact_key: the unique identifier of the user who performed the action. This needs to be the same as the contact_key field in the track file.
    Example: “abc123”, “paul@madkudu.com”

  • email: the email of the contact. Pass it even if the contact_key already contains the email.
    Example: “paul@madkudu.com”

In addition to those two required fields, you can add any attributes of your users you would like us to know about:

{"contact_key":"abc1234", "email":"paul@madkudu.com"}
{"contact_key":"432535", "email":"paul@madkudu.com", "some_other_contact_field":"some value"}

1.3 Group

The following properties are required for the group file:

  • contact_key: the unique identifier of the user who performed the action. This needs to be the same as the contact_key field in the identify and track files.
  • account_key: a unique identifier for the account the user belonged to.

Important: Our system supports one account per contact. If there are severals, we’ll use the latest one. If you have a use case for having a user belonging to several accounts, we’d love to hear about it. Please let us know at support@madkudu.com.

In addition to those two, you can pass any attributes of your accounts you would like us to know about:

{"contact_key":"abc1234", "account_key":"madkudu.com", "name": "madkudu"}
{"contact_key":"432535", "account_key":"madkudu.com", "some_other_account_field":"some value"}

1.4 CRM data for Customer Fit training

If you are working with MadKudu to configure a customer fit model based on data extracted from your CRM, you may provide a unique file with the following fields:

Required:

  • email: the email of the lead
  • account: the identifier of the account the lead belongs to
  • created_at: the date the lead was created at, in Unix time
  • is_worked: boolean flag to indicate if the lead was worked by sales
  • is_converted: boolean flag to indicate the account became a paying customer. If you would like to define conversion differently (Opp created, Opp stage 2…), please adapt the value accordingly
  • amount: average monthly revenue generated over the first 3 months after conversion (MRR at time of close if you are SaaS)

Optional:

  • is_unqualified: boolean value to indicate sales flagged this lead as an inappropriate fit for your business. Leads set to nurture should not be flagged here.
  • worked_date: the date the lead was first reached out to by Sales, in Unix time
  • converted_date: the date the account converted at, in Unix time
  • deal_amount: the total amount generated from the first contract with the account. This can include services, annualized amounts…
  • self-input information at time of lead creation
  • any other field that you’ve augmented your leads with and want MadKudu to evaluate

NB This is currently only available for Ent. plan customers.

{"email":"elon@tesla.com", "account":"tesla", "created_at": "1234567890", "is_worked":"true", "is_converted":"true", "amount": "2499"}
{"email":"elon@tesla.com", "account":"tesla", "created_at": "1234567890", "is_worked":"true", "is_converted":"true", "amount": "2499", "is_worked":"true", "is_unqualified":"false", "team_size__c": "10-49"}

2. Transfer

Upload the data to our Amazon S3 bucket in the format specified below and we will automatically import the new data.

The credentials to use for the upload will be communicated to you separately.

If you prefer to use your own S3 bucket, we can pull your data from there. Please contact hello@madkudu.com for details.

2.1 File naming

In the S3 bucket, you will see three folders:

  • identify
  • track
  • group

Make sure to upload each file to the correct folder.

If you use the S3 API, simply “prefix” your destination file name. For example, uploading to "identify/name_of_file.csv" will add a file name name_of_file.csv to the identify folder.

You can name the file whichever way you’d like. We’ll automatically import any new file in the folder. You just need to make sure that the right type of files go in the right folder.

3. File format

We currently support two file formats:

  • Newline-delimited JSON (preferred)
  • CSV

3.1 Newline-delimited JSON

Our preferred format for upload is newline-delimited JSON, which is more standardized and less error-prone than CSV.

In this format, the different records are separate by the newline (\n) character. Each line is a valid JSON object:

{"event_text":"signed up", "event_timestamp":1234567890, "contact_key":"abc1234"}
{"event_text":"added a friend", "event_timestamp":1234567890, "contact_key":"paul@madkudu.com", "some_other_event_field":"some_value"}
{"contact_key":"abc1234", "email":"paul@madkudu.com"}
{"contact_key":"432535", "email":"paul@madkudu.com", "some_other_contact_field":"some value"}

3.2 CSV

We also support the .csv format, with the recommended format:

  • separator: ,
  • delimiter: "
  • line separator: line-break (\n)
  • column names in the first line

3.3 Data validation

JSON line and CSV are relatively easy to corrupt (for example with " or , ` characters in the data).

We will validate the data on your side and warn you on your corruption issue, but it helps a lot if you make sure to:

JSONL

  • escape any double quote in your data with a \ (e.g. replace “ with \”)
Incorrect
{"event_text":"signed up", "event_timestamp":1234567890, "contact_key":"abc1234", "key": "val"ue"}
Correct
{"event_text":"signed up", "event_timestamp":1234567890, "contact_key":"abc1234", "key": "val\"ue"}

CSV

  • Escape your " characters by adding a second " character in front of it (see here for details)
  • Remove all line break characters (for example \n) from your fields
  • Make sure the number of fields is the same for each line
Incorrect
abc,cde,efg
Correct
"abc","cde","efg"
Incorrect
"abc","cd"e","efg"
Correct
"abc","cd""e","efg"

3.4 Compression

To speed up the data upload part, we highly recommend that you compress your file with gzip before uploading them to S3.

You can call your file however you want it, just make sure to add the correct extension depending on your file format:

  • .json.gz for compressed JSON (recommended)
  • .json for uncompressed JSON
  • .csv.gz for compressed CSV
  • .csv for uncompressed CSV

4. File upload

Files can be uploaded to Amazon S3 via:

  • a Command Line Interface (CLI). Ideal for testing and debugging.
  • SDKs in many common language (Java, JavaScript, Python…). Perfect for automating your data transfer.

4.1 Upload files via the CLI

Install and configure the CLI

Get started by following the instructions to install the AWS CLI.

Configure the AWS CLI with your credentials

Once installed, configure the AWS CLI with the credentials we sent you.

The easiest way to do this is to use the aws configure command in your command line and enter the credentials.

When prompted, use us-east-1 as the default region.

If you already use the AWS CLI, you can configure multiple authentication profiles. Please refer to this part of the AWS documentation.

Upload a file
From this point on, we will be using the s3api part of the AWS CLI to upload and retrieve files. For options not described here, please refer to the s3api reference.

To upload files to AWS S3, we will use the put-object command of the S3 API.

The upload command will look like this:

aws s3api put-object --bucket madkudu-data-in-XXXX --key destination_file_name.csv --body ./Documents/file_to_upload.csv

where:

  • madkudu-data-in-XXXX is replaced by the bucket-name that we created for you
  • destination_file_name.csv is replaced by the name to give to the uploaded file on S3
  • ./Documents/file_to_upload.csv is replaced by the path to the file on your local machine

4.2 Folders

If you want to upload a file to the identify folder, use identify/destination_file_name.csv as the --key parameter.

Example:

aws s3api put-object --bucket madkudu-data-in-XXXX --key identify/contact_file_name.csv --body ./Documents/contacts.csv

4.3 Compression

To speed up file transfer, you can compress files locally before transferring them to Amazon S3. If you want to compress your files, please use the GZIP compression method and use .gz or .gzip as your file extension (we currently don’t support other methods or other extensions)

Frequently Asked Questions

Your file format doesn’t work for me. What do I do?

If you’re having any issue with the file format, please reach out to us at support@madkudu.com and we’ll be happy to help.

How often is the data refreshed?

As soon as you drop data to the S3 bucket, expect results to be updated in the application within 6 hours.

Do you have any requirements for the event naming?

As a general rule of thumb:

  • The event_text should be a hardcoded value
  • If an information is dynamic (e.g. category of the column being edited), send as an extra column.
    If you have extra properties for events, you can add them as extra columns. For example, for an “Edit Product”, you could have a product_name property. However if there is something core to the value proposition of the app in the editing, we’d recommend you make that an event (e.g. adding an alter to a Product).

Also, if a property is only relevant for one type of event, choose an explicit column_name, add it to the file and leave it NULL for every other type of events.

What would happen if I send same event more than once - will it appear twice in MadKudu?

Our system will dedupe the events based on contact_key / event_text / timestamp. If you send the same event twice, only one will be kept:

  • If sent in two separate batches: only the most recent will be kept
  • If sent in the same data batch, first one in the file.

Can we add other attributes to the Identify records?

Yes, please send any attributes you have stored in your user table (except sensitive ones (password, cc number)).

In particular, it is always helpful to get the following:

  • signup_date
  • current plan / value of the plan
  • is_owner or is_admin or some kind of role (if you have multiple users for an account but only some have the ability to upgrade)