>If you only have an email address and you make zero effort to cross-reference that with other data (using, for example, any datasets you purchased, or a marketing data enhancement system) then you are not connected their email address to their identity.
That is not my interpretation. As I'm reading it, all data that is routinely collected has to be disclosed, even if it is never cross referenced with any third party datasets.
I think if you create a record on your server for each user (identified by some user ID) and you store the user's email address in that record, then you must disclose that fact.
You have to disclose this, but the problem is the question following this. If you collect the email address, Apple wants to know if you use this to link to the user’s identity. And this is where it’s confusing. Without a definition of "identity", I don’t know if I answer this question properly.
If the same person registers two accounts with two email addresses, but provides the same information for both, would you know that they're the same person?
If the same person registers two accounts with two email addresses, but provides the same mailing address for both, and you send a postal catalog to each of them, would your systems detect the duplication and only send one catalog?
If either Yes, and for some companies it's both Yes, then you are linking their email address to their identity — their personhood, their struct {} of data fields.
If either No, and for many companies it's both No, then you are not linking their email address to their identity.
(Obviously having postal address creates other problems for you, I'm just trying to do my best to analogy here. For definite answers you presumably already have contacted Apple, as Apple is clearly reserving the right to make judgement calls when asked questions about this.)
> If the same person registers two accounts with two email addresses, but provides the same information for both, would you know that they're the same person?
Probably. I find it odd that Apple didn't choose words that are already clear with respect to privacy laws such as GDPR. The GDPR doesn't talk about identity. It defines personal data or personally identifiable information (PII). If you collect this data, you're subject to GDPR compliance.
Apple has a weird phrasing of this. You apparently can collect an email address, but not link it to an identity, which is different from collecting an email address and linking it to an identity. It's unclear to me what they mean by this and what "identity" is supposed to mean.
It's way easier to say: an email address is a piece of data that could identify a person, hence you must treat it carefully and comply with GDPR laws (collect it with consent only, make sure to delete it when you're done, user's right to change PII and user's right to get info about everything they have on you).
I agree with you that "identity" is not well defined in Apple's document.
The way I'm reading it is that "identity" is anything that uniquely identifies each user of your app, i.e. something like a GUID or any generated user ID. It does not necessarily mean that you are able to identify the real-world person behind the user record.
So for instance, if you collect the number of steps each user has taken each day and you store that information on your server associated with a user ID, then you have collected that data and you have linked it to the user's identity, even if you know absolutely nothing else about that user.
What would it mean to collect data without linking it to a user's identity? I think it means collecting aggregate or statistical data. If you transmit the number of steps taken by each user to your server, but you only ever store the average number of steps taken across all your users, then you have collected data without linking it to a user's identity.
For email addresses the distinction between collection and linking to users makes no sense. It's always going to be both or neither.
So that's what I believe. What's important though is what Apple actually means. And I fully agree with you that this document needs clarification.
That is not my interpretation. As I'm reading it, all data that is routinely collected has to be disclosed, even if it is never cross referenced with any third party datasets.
I think if you create a record on your server for each user (identified by some user ID) and you store the user's email address in that record, then you must disclose that fact.