I find there are several methods of calculating the open rate;
1. dividing the aggregated "sum of opened" by the aggregated "sum of sent" x100
2. dividing the aggregated "sum of opened" by the aggregated "count_distinct of email addresses" x100
3. calculating the "count_distinct of emails" number twice, by running the data set twice. First, finding the total number of distinct email addresses. Second, adding a filter for "opened = 1" to find the total number of distinct email addresses that have been marked as having opened the email. I divide the second number into the first number to find the open rate.
Of these three methods, number 3 seems the most accurate, but requires running the dataset twice. All three methods result in slightly different numbers, with variances ranging of around 1%. What is the source of truth?
Best answer by abhishek_sivaraman
View original