Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Advice for syncing Salesforce and MySQL

  1. #1
    Join Date
    Sep 2012
    Posts
    16

    Question Advice for syncing Salesforce and MySQL

    Hi there - I'm new to Kettle, so apologies if the answer to this is obvious.

    My situation is that I need to periodically (say once a day) sync data in Salesforce with data in our existing MySQL db - changes in the one need to be propagated to the other. The schemas for the tables in question are different. We can handle any (very unlikely) conflicts manually, just so long as we know that they exist.

    I'm wondering if anyone has any ideas on the best approach for me to take? Can Kettle do this out of the box, or will there be extra steps in the process? I had thought about using triggers in Salesforce and MySQL to keep track of the changesets in each db in a separate table, and then comparing the differences (using Merge Rows?), but thought there must be a better way.

    Does anyone have any ideas? Any help would be much appreciated - thanks.
    Last edited by Sammo; 09-25-2012 at 10:19 AM.

  2. #2

    Default

    I doubt if there is any single correct answer to this without better understanding your needs. We use Kettle to synchronize our Salesforce customer and product tables to MySQL each day. Our approach is very simple--the tables are not especially large, so we read the entire table and apply updates to a Type 1 dimension (Dimension lookup/update). This automatically creates rows if they do not exist, and updates rows if needed.

    If you have too much data in Salesforce to process all, you might be able to pull it by date range instead.

  3. #3
    Join Date
    Sep 2012
    Posts
    16

    Default

    Quote Originally Posted by jsturm View Post
    I doubt if there is any single correct answer to this without better understanding your needs. We use Kettle to synchronize our Salesforce customer and product tables to MySQL each day. Our approach is very simple--the tables are not especially large, so we read the entire table and apply updates to a Type 1 dimension (Dimension lookup/update). This automatically creates rows if they do not exist, and updates rows if needed.

    If you have too much data in Salesforce to process all, you might be able to pull it by date range instead.
    Thanks jsturm - is your synchronization method bidirectional? (A row that's changed in MySQL is also changed in Salesforce?) It was the bidirectional aspect that really had me stumped.

  4. #4

    Default

    After trying for a long time to use the Salesforce plugin in Spoon and having many errors, I finally grew tired and started using Jitterbit for Salesforce Integration. Although I prefer to use Spoon/Kettle because it is more robust than Jitterbit, I grew tired after having it fail on me for a long time. I think the Salesforce plugin works fine for smaller tables but when we are trying to sync our leads table, for example, it constantly fails. I even tried breaking it up into smaller chunks. So, only leads that are modified during a 3 hour period. However, it still fails constantly. I don't like having to use two different data integration tools but, unfortunately, I don't have an option at this point. I would love to get more feedback on the Salesforce plugin in Spoon. I wonder if it is something that we are doing that is specifically messing with it.

  5. #5
    Join Date
    Aug 2011
    Posts
    236

    Default

    Hi,

    You can do it but you need to use the API not the Salesforce input component ( this is only good for small updates). It is more involved becuase you have to form the XML but I use it all the time. You do have to setup a 'polling' loop to see when the batch is complete but it works great for large volumnes of data.

    There is plenty of documantation :-

    http://wiki.developerforce.com/page/...ages_(10.0_API) - API examples

    http://www.salesforce.com/us/develop.../api/index.htm - SOAP API docs

    HTH.
    PDI 8.0.0
    MySQL - 5.6.27
    Redshift - 1.0.1485
    PostgreSQL 8.0.2
    OS - Ubuntu 10.04.2

  6. #6

    Default

    Hmm... Thanks for the advice. I will look into it.

    Quote Originally Posted by tnewman View Post
    Hi,

    You can do it but you need to use the API not the Salesforce input component ( this is only good for small updates). It is more involved becuase you have to form the XML but I use it all the time. You do have to setup a 'polling' loop to see when the batch is complete but it works great for large volumnes of data.

    There is plenty of documantation :-

    http://wiki.developerforce.com/page/...ages_(10.0_API) - API examples

    http://www.salesforce.com/us/develop.../api/index.htm - SOAP API docs

    HTH.

  7. #7
    Join Date
    Aug 2011
    Posts
    236

    Default

    Hi,

    Here's a picture with annotation on my main calling job to SFDC.

    Serveral things to note :-
    1) You need to use your userid/security token to make a call to get session details first. I store these details in parameters to use later.
    2) Because there are several calls to salesforce - I created a paramatized script to make all the various calls - the format of the calls is the same.
    3) The results you get from salesforce have no identifiers so I create my own sequence number for the input and results and match on that. Salesforce returns results in the same order you sent the request.
    4) You need some way to find out if there were errors and react accordingly - I put good/bad results in to separate files then just do records count on each.

    This might explain things a bit better .

    Name:  sfdc_call.jpg
Views: 122
Size:  34.7 KB
    PDI 8.0.0
    MySQL - 5.6.27
    Redshift - 1.0.1485
    PostgreSQL 8.0.2
    OS - Ubuntu 10.04.2

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.