Solved

Outer Join (Retain All Records From Both Datasets) Question

  • 30 October 2017
  • 14 replies
  • 172 views

Badge
I'm trying to use the outer join "Retain all records from both datasets" option in a Merge task in a Bionic Rule that is bringing together data from three separate MDA tables.  I want to merge on the field that corresponds to the Email Address of the client.  Not all clients are represented in each set of data (ie - in this case there is no one table that includes all clients).  



However, because I can't include both "User Email" and "User Email" in the Show Fields,



some of the email addresses are getting left off of the results and I'm left with data points (number of downloads, for example) that have no contact identification information.



I therefore can't load this data to the MDA table using this field as an identifier. 



When I rename the fields to "User Email" "User Email 2" "User Email 3", etc, I am able to view the email for all  results however I'm not sure of the best way to load this to the MDA table, as I now have multiple fields containing the email address.  



Looking for some help or advice on how best to perform Outer Joins or to merge data when using the same field to merge on and identify data.
icon

Best answer by dan_ahrens 5 October 2018, 03:44

View original

14 replies

Hi Kate,



Is the Email Id unique to an account [i]ID? If Yes, then merge on following two fields:

Account ID, Email ID



This should solve your problem.



If the Email ID is not unique to an account ID, then there might be a case where you will have more than one Email Id's for an account ID. How do you want your output data to look like in that case? 

2 different rows for the same account ID or just one row with two different columns for the email ID?



Regards,

Jitin
Badge
My first screenshot was misleading in that it shows Account ID being used as a Merge On field.  There is definitely a case in which there are more than one Email for an Account ID.  



I want my output data to have 1 row for each Email, and there would be 2 different rows for the account ID.  I'm just struggling to have ALL emails show in the same column when they come from different sources.  Ideally, I'd want all the Emails aligned in the same column.  Does that make sense?
HI Kate,



This is achievable via Bionic Rules. Assuming there are 3 datasets and all these 3 datasets have an email ID field. 

Let me name these datasets as D1, D2 and D3.

Create a merge task M1, Merge on Email ID field (Retain all records from both datasets), select all show fields(Email ID only once).



Then Merge M1 and D3, Merge on Email ID field (Retain all records from both datasets), select all show fields(Email ID only once).



This will result in just one Email ID column with all the show fields which you want in your output task.



Try this and let me know if this doesn't help.



Regards,

Jitin
Badge
Hi Jitin - thanks for your response.  This is what I first tried and yet when I looked at the output of the merge, I found that the values did not always show.  Attached is the output of the merge set up in the way that you describe here.  The lines highlighted yellow did not contain the email address.

Userlevel 7
Badge +2
Kate, does every one of your records in each of the data sets have a valid email address or might some be missing an email address?
Hi Kate,



I understand your problem now. We are working on a feature called vertical merge (timelines not yet decided), which will help all emails to be listed in a single row for your use case. 

Alternative way to do it right now is:



If there are 2 objects D1 and D2. 

Merge on the email id field. (Retain all records from both datasets)



This will result in a dataset which will have 2 columns for email IDs (some values will be blank/null in both the columns)



Create an action "Load to MDA" on object D2, select the above resulting Dataset as source. Do an upsert based on email Id.



This will populate D2 with email ID's from D1. 



After that, you can re-merge D2 with D1 to ensure all fields are available in your resulting dataset and one column of email id has all the email id's from both D1 and D2. 



This will become much simpler once Vertical merge gets shipped (Timelines not yet decided). I will keep you updated. 



Let me know if the above approach doesn't work..



Thank you,

Jitin
Badge
Thanks Jitan - I think I will just pull in a "master emails" table and proceed using a left-join for now. Looking forward to learning more about this magical Vertical Merge.
I have a similar issue where we are trying to Join Activities and Open Opportunities tied to accounts in order to calculate a count for sales engagement. When joining the two datasets (Activities + Open Oppty), the output will result in a column with blank Account ID's. 

The above mentioned idea will not work in this case. Screenshots attached for reference. 

Is there any other workaround to get a true count of Open Opptys+ Activities for all accounts where either one is populated.









Userlevel 7
Badge +2
Hi Leslie, can you explain a little more why using the formula field to add the two values is not working for you? Can you share a screen shot of the merge task?
Here you go Dan.

I was noodling over this issue and might have come up with an interim solution



Rule 1: Find count of all Activities matching a certain criteria.

Action 1: Write this data to the Account on a custom field (numeric) using the rule engine



Rule 2: Find open opportunities

Action 2 Write this to another custom field on the Account (can also use a rollup field here)



Rule 3: Find all accounts where count of activities or count of open opportunities ( from above 2 rules) is greater than zero.

Action 3: Set Scorecard.
Userlevel 7
Badge +2
The issue you're running into with that merge is that if a record does not have any activities but does have opportunities, it will appear in your "pull opps" fetch, but not in your "pull activities" fetch.



Then when you merge, you bring in the count of opportunities, but don't bring in the Account ID or Account ID name (they are unchecked in your Show Fields area.



The way around this is to change the output name for Account ID and Account ID Name so that records that do not exist in both fetches still retain their account ID and name.



Another way to skin this would be to do a third fetch of all accounts. Then do two merges, first merge has the "all accounts" fetch in the left dataset with a left join merged with "pull activities" - write the count of activities to an output field. 



Then merge this task with your "pull opps" (in the right position) using a left merge on the output from the first merge. Bring in count of activities and add count of opps in your output. 



That will give you a record for each account that has a column for "count of activities" and "count of opps"
Excellent, that worked like a charm! Wonder why I didn't think of that before. Thank you Dan.
Userlevel 7
Badge +2
Glad I could help. Working with data is a little tricky sometimes and having someone else 'check your work' can be a godsend. Have a great day and happy bionic ruling! 🙂

Reply