Outer Join (Retain All Records From Both Datasets) Question

  • 1
  • Question
  • Updated 8 months ago
  • Acknowledged
  • (Edited)
I'm trying to use the outer join "Retain all records from both datasets" option in a Merge task in a Bionic Rule that is bringing together data from three separate MDA tables.  I want to merge on the field that corresponds to the Email Address of the client.  Not all clients are represented in each set of data (ie - in this case there is no one table that includes all clients).  

However, because I can't include both "User Email" and "User Email" in the Show Fields,

some of the email addresses are getting left off of the results and I'm left with data points (number of downloads, for example) that have no contact identification information.


I therefore can't load this data to the MDA table using this field as an identifier. 

When I rename the fields to "User Email" "User Email 2" "User Email 3", etc, I am able to view the email for all  results however I'm not sure of the best way to load this to the MDA table, as I now have multiple fields containing the email address.  

Looking for some help or advice on how best to perform Outer Joins or to merge data when using the same field to merge on and identify data.
Photo of Kate Green

Kate Green, Champion

  • 4,368 Points 4k badge 2x thumb

Posted 8 months ago

  • 1
Photo of Jitin Mehndiratta

Jitin Mehndiratta, Product Manager

  • 2,862 Points 2k badge 2x thumb
Hi Kate,

Is the Email Id unique to an account ID? If Yes, then merge on following two fields:
Account ID, Email ID

This should solve your problem.

If the Email ID is not unique to an account ID, then there might be a case where you will have more than one Email Id's for an account ID. How do you want your output data to look like in that case? 
2 different rows for the same account ID or just one row with two different columns for the email ID?

Regards,
Jitin
Photo of Kate Green

Kate Green, Champion

  • 4,368 Points 4k badge 2x thumb
My first screenshot was misleading in that it shows Account ID being used as a Merge On field.  There is definitely a case in which there are more than one Email for an Account ID.  

I want my output data to have 1 row for each Email, and there would be 2 different rows for the account ID.  I'm just struggling to have ALL emails show in the same column when they come from different sources.  Ideally, I'd want all the Emails aligned in the same column.  Does that make sense?
Photo of Jitin Mehndiratta

Jitin Mehndiratta, Product Manager

  • 2,862 Points 2k badge 2x thumb
HI Kate,

This is achievable via Bionic Rules. Assuming there are 3 datasets and all these 3 datasets have an email ID field. 
Let me name these datasets as D1, D2 and D3.
Create a merge task M1, Merge on Email ID field (Retain all records from both datasets), select all show fields(Email ID only once).

Then Merge M1 and D3, Merge on Email ID field (Retain all records from both datasets), select all show fields(Email ID only once).

This will result in just one Email ID column with all the show fields which you want in your output task.

Try this and let me know if this doesn't help.

Regards,
Jitin
Photo of Kate Green

Kate Green, Champion

  • 4,368 Points 4k badge 2x thumb
Hi Jitin - thanks for your response.  This is what I first tried and yet when I looked at the output of the merge, I found that the values did not always show.  Attached is the output of the merge set up in the way that you describe here.  The lines highlighted yellow did not contain the email address.
Photo of Dan Ahrens

Dan Ahrens, Gainster (employee)

  • 22,210 Points 20k badge 2x thumb
Kate, does every one of your records in each of the data sets have a valid email address or might some be missing an email address?
Photo of Jitin Mehndiratta

Jitin Mehndiratta, Product Manager

  • 2,862 Points 2k badge 2x thumb
Hi Kate,

I understand your problem now. We are working on a feature called vertical merge (timelines not yet decided), which will help all emails to be listed in a single row for your use case. 
Alternative way to do it right now is:

If there are 2 objects D1 and D2. 
Merge on the email id field. (Retain all records from both datasets)

This will result in a dataset which will have 2 columns for email IDs (some values will be blank/null in both the columns)

Create an action "Load to MDA" on object D2, select the above resulting Dataset as source. Do an upsert based on email Id.

This will populate D2 with email ID's from D1. 

After that, you can re-merge D2 with D1 to ensure all fields are available in your resulting dataset and one column of email id has all the email id's from both D1 and D2. 

This will become much simpler once Vertical merge gets shipped (Timelines not yet decided). I will keep you updated. 

Let me know if the above approach doesn't work..

Thank you,
Jitin
Photo of Kate Green

Kate Green, Champion

  • 4,368 Points 4k badge 2x thumb
Thanks Jitan - I think I will just pull in a "master emails" table and proceed using a left-join for now. Looking forward to learning more about this magical Vertical Merge.