Data Transformation Using Dataflow Builder and Importing It into Einstein Analytics

Posted by Dr. Shiulee Chakraborty | Posted On November 9th, 2017 | Salesforce, Salesforce Einstein

In continuation with my previous blog on how to prepare a dataset using dataset recipes, the next topic in this series is the transformation of data using Dataflow Builder. Dataflow JSON is set of instructions contained in a JSON file for loading data from multiple Salesforce objects, or existing datasets pulled from an external source, and transforming them to create a new dataset in Einstein Analytics.

Dataflow Process

The modification of data using Dataflow Builder in Einstein Analytics involves five steps (Figure 1):

  1. Create a dataflow design and identify which transformation is required.
  2. Integrate external data into a dataset. The external data in dataflow is named as “Edgemart.”
  3. Configure the new definition file with either the dataset builder, or dataflow JSON, or by manually uploading the definition file.
  4. The next step is to start the dataflow, monitor the dataflow, and troubleshoot if errors are notified in the dataflow.
  5. And the last step is to schedule the dataflow for daily runtime.

Now we will discuss each step of the dataflow process in detail.

Einstein Analytics data flow process

Figure 1. This figure illustrates the Dataflow Process.

Dataflow Design

The first step is to create a dataflow definition file, which is a JSON file. Consider what data should be available for queries, from where to extract that data, and if you need to modify the extracted data to prepare your final dataset. To explain significant decisions, let’s examine one instance. In this example, the goal is to combine Salesforce data with external data (Figure 2). The dataset will contain Salesforce data viz., Opportunity, Account, and Users joining with some external data file (CSV format).

Einstein Analytics Salesforce data

Figure 2. Design of Dataflow for Combining Salesforce Data with External Data by Dataflow JSON

The dataflow extracts opportunity, account and user data from the Opportunity, Account and User objects. For each extracted object, the dataflow creates a new dataset called “sfdc Digest.” The external data can be extracted through external connectors (which we’ll address in the next blog series) to create a new dataset called “Edgemart.”

Dataflow Builder then transforms the datasets created from the internal and external extracted data. Primarily, the dataflow joins the opportunity, account and user data into a new dataset by a transformation called “Augment.” Next, the dataflow augments Edgemart with the Salesforce augmented dataset. Finally, so end users can access the augmentation joined, you configure the dataflow to register the final dataset.

Types of Transformation Based on Dataset

Transformation is the manipulation of data. Data Builder offers many different manipulations to transform your dataset (Figure 3), including:

  • Sfdc Digest Transformation: This feature extracts data from Salesforce objects.
  • Digest Transformation: This transformation joins the data extracted from different connectors to create a new dataset.
  • Edgemart Transformation: This feature extracts the data from external sources (such as CSV files).
  • Append Transformation: This transformation combines records from different datasets into one dataset but does not remove duplicate records.
  • Augment Transformation: This transformation augments/joins the extracted input dataset by merging columns from other related datasets by using a unique ID.
  • Compute Expression Transformation: This transformation allows you to add derived fields to a dataset. The values for calculated fields generated through the values using a SAQL query, which can be derived from one or more fields of the input data.
  • Compute Relative Transformation: You can use this transformation to evaluate the trends in the data by adding calculated fields to a dataset based on values available in other rows.
  • Dim2mea Transformation: This transformation generates a new measure based on a dimension.
  • Flatten Transformation: This transformation flattens hierarchical role data. For instance, you can flatten the Salesforce role hierarchy. This transformation is used only for those datasets in which role-based row-level security is applied.
  • Filter Transformation: This transformation is used when you need to check a specific criterion. This feature defines the condition that identifies which record to preserve in the dataset.
  • Slice Dataset Transformation: This transformation removes fields from a dataset, leaving a subset of fields for practice in a new dataset.
  • Update Transformation: This feature updates the specified field values in an existing dataset based on the information from another dataset, called the “lookup dataset.” The transformation looks up the new values from matching fields in lookup dataset. The update transformation feature stores the results in a new dataset.
  • Sfdc Register Transformation: This function registers the dataset, so it will be available for queries for further building of lenses and dashboards.
Einstein Analytics dataflow builder

Figure 3. Transformations in Dataflow Builder

Configure the Dataflow (based on the business requirement)

After designing of dataflow, the next step is to set the dataflow. You can configure the dataflow to extract data, transform datasets based on the business requirement, and register the datasets that you want to make accessible for queries. To configure the dataflow, add transformations to the dataflow definition file (JSON file).

Start, Monitor and Reschedule the Dataflow

After creating the dataflow, you have to manually start a dataflow job to load the data into datasets immediately. There is a limit in Dataflow Builder: you can run 24 dataflow jobs within 24 hours, but if you run the dataflow job within 2 minutes, it is not countable. After the initial dataflow job runs, you can schedule it to run daily at a particular time. The scheduling option will detect the updated Salesforce data. You can stop, reschedule, and monitor a dataflow job at any time to troubleshoot issues.

After transformation of a dataset, the next important step is to import data into Einstein Analytics. There are various methods of data integrations:

1. User Interface: From the Analytics page using User interface, click on the Create button and choose CSV from the drop-down menu. Figure 4 shows the screenshot of the User interface where you can upload your CSV file. You can add external CSV data by clicking on Section 1. After uploading an external file, the JSON file will automatically generate, which you can download to change optional metadata (Section 2). Later you can upload the updated JSON file in Section 3. You can preview your data by clicking on the Preview Data button (Section 4).

Einstein Analytics external CSV data

Figure 4. Uploading External CSV Data by User Interface

2. Wave Connector: To upload Excel data, use the Wave Connector app, which offers an easy method to import data from Microsoft Excel 2013 to Einstein Analytics. The Wave Connector is available as an application from the Microsoft Store. After installation, click on Wave Connector to import data from Excel to Salesforce.

To install Wave Connector, open Microsoft Excel 2013 (or later version) and click Insert tab and then go to the Office app store. Search for Wave Connector and click to install it. After installation, the app will ask you to connect your Salesforce account to start uploading the data (Figure 5).

Einstein Analytics Microsoft Excel

Figure 5. Uploading Microsoft Excel Data via Wave Connector

3. Dataset Builder: From the Analytics Studio homepage, click on Create and select Dataset. You will see various options, choose Salesforce. Pick the Salesforce object to start. This is called a “grain” object, with which we will create the relationship. Next click on the Plus button and select appropriate fields. If you want to add more Salesforce objects, click on the Relationship tab, select another salesforce object and click on Join. This is how you transform Salesforce data into Einstein Analytics (refer Figure 6).

Einstein Analytics dataset builder

Figure 6. Loading of Salesforce Data by Dataset Builder

4. External API: If you wish to implement your own connector to load CSV or Zip files into Einstein Analytics, you can use the External Data API. The architecture of External Data API is shown in Figure 7.

There are multiple connectors available in Einstein Analytics:

  • Salesforce local connector replicates the additional objects without having to touch the dataflow. Salesforce
  • Multi-org Connector connects to an external org to upload the external data into Einstein Analytics.
  • Marketing Cloud Connector enables access from Marketing campaign data.
  • Heroku Postgres connector moves data from Heroku Postgres into Einstein Analytics.
  • Amazon Redshift connector transfer raw data from Amazon Redshift in a aggregated form into Einstein Analytics.
  • Microsoft Dynamics CRM Connector can replicate the data from Microsoft Dynamics 365 to Einstein Analytics for further building of lenses and dashboards.
  • Google BigQuery Connector is a new connector infused in Einstein Analytics Winter 18 release to get the Google Analytics data, which can be useful for the marketing team to track the website content and online engagement. Now the company can duplicate this data in Einstein Analytics using lenses and dashboards (Figure 8).
Einstein Analytics architecture external data API

Figure 7. Architecture for External Data API

Einstein Analytics connectors

Figure 8. Types of Connectors in Einstein Analytics

In this blog series, we covered the competencies of Dataflow Builder to transform the data for building specific dashboards and various data integration methods to import data offered by Einstein Analytics.

The next topic in this series will include the Business Problem solved by features of Einstein Analytics mentioned above using Team Foundation Server data for Project Management. Stay tuned for my upcoming blogs to know more about Einstein Analytics.

Related posts

Dr. Shiulee Chakraborty

Dr. Shiulee Chakraborty has six years of research experience in the field of data science, which includes machine learning, predictive analysis, forecasting, and statistical modeling. She has completed her Ph.D. from IIT and was honored with a gold medal by the President of India for her research.

Leave a Reply