How to Connect Salesforce to External Database Sources with Einstein Analytics

One of our clients was looking for a Sales Performance Pipeline dashboard capable of connecting to a large external data source. Most admins are experienced in creating dashboards using the data which is available in the Salesforce org, but in this example, the sales data is in a SQL Server. The question is, how can we pull the SQL Server data into Einstein?

In Salesforce's Winter '18 release, they introduced several new connectors including:

Redshift, Microsoft Dynamics CRM, Salesforce Multi-Org, Salesforce Marketing cloud, Heroku Postgres and Google Bigquery.

While these connectors are a great step forward in connecting to outside data sources, a connector for SQL Server has not yet been established. We need to go with an alternative approach until Salesforce adds a SQL Server connection in the upcoming release.

Approach

We need to implement a custom integration application by using an open source programming language like java/python. The primary focus of this application is there should be an automated job which runs on a regular basis to pull the data from our external data source (SQL server) and push into Einstein Analytics; then we can create and schedule a data flow for updating the dataset with live data in Einstein Analytics.

Process

Step 1: Check the users for SQL Authentication in SQL server. If no user is found, then create the database user with SQL Authentication Role and provide read access to this user.

Step 2: Check for any authorized API’s available with authentication for pulling data from SQL server, if no API available then create a Rest/SOAP Application to get the data from the SQL server database (for past 1 or 2 days’ data).

Step 3: Once finished with API for pulling data from SQL server, then we need to consume this in Salesforce. To do so, generate the Enterprise WSDL file in Salesforce Org. Using this WSDL file, generate the jar file.

Step 4: Create the job and schedule it. The focus of this job is to get the data from the SQL server (via API built in step 2) and push the response data into Salesforce. Next, we need to set the schedule for this job as for example 12 am every day.

Step 5: Create a data flow in Einstein Analytics for datasets, and schedule the data flow for live data.

Step 6: Create Sales Performance Pipeline dashboard with resultant datasets.

In the Spring '18 release, Salesforce Einstein provided a variety of connectors for the corresponding data sources, such as Microsoft Azure SQL Database, Amazon RDS Oracle, Amazon RDS PostgreSQL, Amazon RDS SQL Server, Amazon RDS MySQL, Amazon RDS MariaDB, Amazon RDS Aurora MySQL, Amazon RDS Aurora PostgreSQL and Snowflake Computing. We can use these connectors to pull the data from the external cloud databases.

Now we have Amazon RDS SQL server connector from Spring 18 release; this connector will satisfy our requirement.

Amazon RDS SQL Server Connector

By using Amazon RDS connecter, we can connect SQL server database and pull the data into Einstein Analytics directly. In Einstein Analytics go to data manager click on setup, go to set up replication, add remote connection then provide several connection parameters. Once this is completed, a data flow should be created for live data.

Now, we are happy with the connectors for connecting external data sources, like SQL server, Mysql, PostgreSQL, etc., so all we need is cloud database servers. Suppose if data is available in non-cloud database servers. For example, SQL server installed on a local server. This case we need to go for Integration Application for pulling data into Salesforce Einstein. We hope in next release Salesforce will introduce a few more new connectors which might satisfy this case.