Apple’s Device Analytics Can Identify iCloud Users

Researchers claim that supposedly anonymous device analytics information can identify users:

On Twitter, security researchers Tommy Mysk and Talal Haj Bakry have found that Apple’s device analytics data includes an iCloud account and can be linked directly to a specific user, including their name, date of birth, email, and associated information stored on iCloud.

Apple has long claimed otherwise:

On Apple’s device analytics and privacy legal page, the company says no information collected from a device for analytics purposes is traceable back to a specific user. “iPhone Analytics may include details about hardware and operating system specifications, performance statistics, and data about how you use your devices and applications. None of the collected information identifies you personally,” the company claims.

Apple was just sued for tracking iOS users without their consent, even when they explicitly opt out of tracking.

Posted on November 22, 2022 at 10:28 AM3 Comments

Comments

Winter November 22, 2022 11:13 AM

Cardinal Richelieu has been quoted as saying he only needs six lines of the most honest man to condemn him to death.

That might not be as easy nowadays. And you do not need even three lines to get death threats.

But six data points on any person could very well be enough to identify any person, honest or not. 6 location + time stamps would be more than enough to identify anyone.

There are 8 billion people in the world, so 33 bit should suffice. Time stamps already give a lot of information about the person doing a task.

Winter November 22, 2022 11:35 AM

And here are the numbers on re-identifyability:

‘https://www.nature.com/articles/s41467-019-10933-3/

Using our model, we find that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes.

Example:

ZIP code, date of birth, gender, and number of children would also identify 79.4% of the population in Massachusetts with high confidence

It is now common practice to never include a date of birth in any published data set. Age, or even age bracket (in decades) is now SOP. ZIP codes below city level are also banned. I have never seen number of children in any data set not specifically related to family structure.

But then, these practices were all designed after Latanya Sweeney’s ground breaking work.

Still, most people are re-identifed because they post everything on social media. It used to be pretty easy to track Russian Oligarchs by following Instagram posts of their children.

Clive Robinson November 23, 2022 8:55 AM

@ ALL,

The answer to the question of if tracking is possible is “it depends”…

We glibly talk of “Data Points”(DPs) and to a lesser extent “Signal to Noise Ratio”(SNR).

But the SNR to what?

1, Individual DPs.
2, Random groups of DPs.
3, Chosen groups of DPs.

Apple talks about “Individual” DPs from their “Chosen Group” of DPs.

This process is covered under,

“Lies, damn lies, and statistics.”

And the easy way to give an analog of the process is to talk about “jigsaws”.

Each piece has a SNR but also that SNR is relative to adjacent pieces in the completed puzzle. It consits of an easy SNR –from the shape– and a variable SNR (from the image).

What most people know is most jigsaws are a rectangle and usually cut on a grid. This tells us there are three basic groups,

1, Cornor pieces.
2, Edge pieces.
3, Inside pieces.

As a rule of thumb this tells us further information, when we apply a “locking mechanism” to the shapes. This is,

4, Each piece is aproximately a rectangle and thus has two axis
5, On each axis there will be two lock features unless “cut”.
6, A lock feature,can be a,”lug” or a hole.

The “cut” rules are based on having straight edges so,

7, A Corner piece has two cuts thus only two lock features.
8, An Edge piece has one cut thus three lock features.
9, An Inside piece usually has no cuts so has four lock features

Further we know that for a cut axis there are only two lock features combinations “lug” or “hole”. For uncut axis there are four “l,l”, “l,h”, “h,l”, “h,h”. We can apply further logic to come up with sixteen shapes for the Internal pieces and so on, into which they can be sorted. However there is ambiguity and what,

10, The actual shape a piece is, is only 100% known when the jigsaw is compleated.

But is the inversion of that statment true?

Actually it’s not. And for that a bit of further information is required. For most pictures they come in one of two orientations painters traditionally call “landscape” or “portrait” what artists and some mathamaticians know is that the shapes of these is defined by optomising the cut of materials[1].

If you assume that the pieces are cut on a square grid, then you can know from the number of pieces the size of the puzzel or vice veser. Thus you know that of your Edge pieces break down into four groups where the groups have only two sizes you can easily calculate…

Further by looking at the types of groups of Inner pieces you have, you can work out further information which is how the grid cut to locking mechanism is applied by the manufacturer. I won’t go into those details as the explanation whilst simple, is also long winded. All you need to know is that more often than not the cut pattern is “determanistic” not actually “random” though it might be a simple pseudo random sequence that is both linear and of short length (for material and cutting die strength).

The “take away” is that for any given piece there is a limited number of places it can be in, which in turn is fixed by the pieces around it forming a larger regular placment grouping using a simple mechanism that mathmaticians call “tiling” (see works by Roger Penrose to see how fun that can get).

So without even looking at the image information you have a much greater SNR than you might have expected. The same applies to individual data points, they do not appear in issolation they have structure. One such is “Social Security Number” and “age” likewise “bank account number” these are “issued sequentially” and against a time demand pattern that can be determined with not much difficulty. Likewise the bank you use is in many cases related to where a person lives or works.

There are lots of such patterns, and though they appear to have a low signal to noise ratio individually, the reality is when cross correlated the noise quickly averages down and the signal multiplies up… Those involved with extracting signals from noise in communications systems have a number of rather usefull formulars, knowing how they were derived tells you how to do similar with the different types of data points. Thus the domain of “Digital Signal Processing”(DSP) is very very similar to “Machine Learning” which is something many in the areas around the ML domain are unaware of.

I could go on to show how the jigsaw,image SNR likewise provides rapid identification again based on underlying rules like perspective and general light intesity/colour of

1, Top : distant, low complexity, light, upper end of the colour spectrun”
2, Bottom : near, high complexity, dark, lower end of the colour spectrum.
3, Middle : graduating transition bands.

Similar but less obvious rules apply for left to right and directional transition based on the assumed light source position.

The point is it alows further placing of a piece based on “sorting by data value”.

So you can see that knowing a few basic rules you can sort jigsaw pieces into quite small groupings without actually knowing what the picture is. By examining those groups you can work out the 2D structure of piece placment. You can even work out where pieces might be missing…

As the old saying has it,

“The devil is in the details”

And whilst a supposed “deities” actions might appear unfathomable, actually the reality –as they are realy the actions of the mental defective– they follow very very simple rules.

Once you know this it’s nolonger “fun” and can also appear quite “sinister” depending on where you are looking from. It’s why I stopped doing actual jigsaws years ago, but more importantly why I’m always concious of “The hand that is not spoken of”.

At the bottom of any hierarchy you look up and see complexity, from the top you look down and see simplicity. This has nothing to do with skill or intelligence, because in part those at the top at the very least practice “lies of ommission” and to use another analogy, as with “Brownian motion” they deal with crowds not individuals, so see predictable fluid behaviour not chaotic particle behaviour. Those at the bottom see mostly chaotic behaviour that only has a mostly unfathomable larger structure…

[1] The cut optomisation is based on area halving/doubling and a 90degre rotation. That is two portraits side by side are the same shape as a landscape of twice the area, likewise Two landscapes on top of each other are the same shape as a portrait of twice the area. It’s the same, “rules” as for the “An” paper sizing of which we mostly see A4 (portrait) for “letters” or A3 (landscape) for “diagrams” and sometimes A5 (portrait) for “note books” going down through the other sizes. Most often seen but with slightly differt areas for “lables” because although the “cut grid” follows the rules,they have an addition reduction due to the way most self adhesive lables have a seperating area on each edge of a 1/16th of an inch or so.

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.