Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Questions about the Massive South African “Master Deeds” Data Breach Answered (troyhunt.com)
61 points by robin_reala on Oct 19, 2017 | hide | past | favorite | 12 comments


The mystery of there being more ID numbers in the breach than people in SA could be explained if they were being used in the database as a generated primary key, and since the remaining columns are nullable, they could filled in with data from various sources as it is sucked in. The fact that New_IDn [1] is an bigint leads me to strongly suspect this, since most developers in SA would treat ID numbers as strings, rather than numbers, unless they had a very good reason to do otherwise.

It would be pretty easy to figure this out by stripping out the last 3 digits from the ID numbers in the DB, and then ordering them by the first 6 (DOB) digits. Assuming that digits 7-10 are allocated to males and females sequentially in their respective ranges, there should be much sparser data for higher numbers in digits 7-10 (say, above 5500 for males and 0500 for females) [2], since those people wouldn't exist.

As for the source of the data, I have never owned property, but my email was in the breach, so the Deeds Office isn't the (only?) source of the data. I also doubt that the Deeds Office even records email addresses.

If would strongly suspect a credit bureau or two, since they track historic address data, emails and employers. I have my credit reports that have slightly incorrect address data from SA credit bureaus, and it would be interesting to see if it matches the data in the DB. My wife, mother, and children don't have credit records of any significance, and none of their emails came up in the breach, and my father and I, who do, had our emails exposed, according to haveibeenpwned.

[1]New_IDn is the column name because the original ID numbers tracked race in Digit "A", and this fell away as apartheid was dismantled, so people born before around 1990 had their original ID numbers reissued, and position A now has an "8" for everyone.

[2]A good upper bound guess would be to look at the number of births in a particular year/365/2 (for each sex)


> The mystery of there being more ID numbers in the breach than people in SA

Do you have a reference for this? From what I've seen, Troy managed to restore 31.6 million records, and said previously that the full backup seems to have 45 million. There last national census (2011) put the (current, living) population at 51.7 million, and rapidly growing. Granted, many of those are not of home ownership age, but the ballpark is very close all of a sudden (especially if you include the deceased).

There's also the question of whether those records are unique ID numbers; from what I understand (based on discussions of colleagues who work with actual SA deeds data), each record represents one status change in title deed of a property. So, each property purchase you've made will be in there as a row, with some people in there dozens of times.

Edit: I see he talks about 60 million rows; though I wonder if the statement about uniqueness is correct. Since the table he refers to seems to have two columns with the letters "ID" in it, I'd wager that the first one is a simple autonumber, and the second one is the actual ID number. But who knows, he's the one with the data.


> Do you have a reference for this? From what I've seen, Troy managed to restore 31.6 million records, and said previously that the full backup seems to have 45 million

the reference for an update on those numbers is in the article linked above. He later got all the data in.

From TFA: "My original import of the South African "Master Deeds" data didn't complete. Just ran a complete one: 60,323,827 rows with unique gov IDs."


I noticed that just after posting; made an edit though. Since he's basing that on a row count, I'm not convinced it's necessarily unique IDs. To be fair, I don't have the data, and he does.

We do like our autonumbers here in SA, so I wouldn't be surprised, despite the column name.


The row count is puzzling, yes. There simply are not that many property-owning South African citizens.

Some expats are known to be in that data. Then there are the rows marked deceased.

But as others have pointed out, so many South African citizens (out of the circa 55 million) are poor and are not property owners. The 60 million rows of data hasn't been comprehensively accounted for yet.


Yepp. As far as I can tell, about 15 million of the population are in the 0 to 14 age group, so are highly unlikely to appear on a deed (not sure if it's even technically possible). The total number of properties in the country has got to be a fraction of that; the average number of people per home is apparently at 2.2 [1].

Stats SA seems to report in the region of 400 to 500K deaths per year, so if it was just down to deceased, it'd have to go back a fairly long way.

[1] - https://www.arcgis.com/home/item.html?id=582208ececa2424ab6e...


>Do you have a reference for this

Yes, the article.

>There's also the question of whether those records are unique ID numbers

Primary key contraints, plus they name of the column "NEW_IDN" (see GP) would strongly suggest that this is the case. It could be an autonumber, but NEW_IDN as a column name has a clear meaning. Also I think Troy would know how to check for duplicates.


The excess ID numbers are AFAIK due to deceased people being in the table.


That's a guess from Troy at this point, he also points out that there may be expats. It should be easy enough to define some statistical tests on the ID numbers and dead/alive status to see if they fit the demographics of the SA as of 2015. Also, the SA population pyramid has, for example, a bulge in the 25-29 group...the breached ID numbers should reflect those trends, rather than being uniformly distributed.

Also, remember that a big percentage of South Africans are poor, and likely to not appear in any databases except for Home Affairs itself, so unless that's the source of the breach I doubt there are any other DBs with even close to that number of ID numbers.


People's inability to understand basic security shouldn't amaze me anymore, yet every time I see things like this I still wonder how many stupid things we don't ever find out about.


I suppose you mean that leaving an open to the world webserver copy lying about without even a password, much less encryption was an idiot's blunder - but that wasn't a legitimate site. It's way downstream of the actual information heist, if I read the article correctly.

Nobody got phished. The information was originally gathered by some legitimate institution, perhaps government, and then sold or stolen. I wouldn't be surprised if Dracore was founded from the start by a leak from an insider who figured out he could quit his govt job, take the database with him, and found a nice business that would make money while he slept. Basic security by individuals at home can't prevent such "inside jobs."

I once knew someone who was offered a million dollars for a copy of a very small corner of a records pile that was part of their normal work responsibilities over a few drinks and said no; I'm less surprised that others might have said yes.

What I call the "mass effect", that is, the fact that 60 million records could fit onto anyone's phone (if it had an external chip smaller than the one in my otherwise cheap phone) makes such inside jobs very hard to defend against; on the "who will watch the watchers" principle. "Mass effect" so called because information is losing mass extremely quickly, a whole lot of it now weighs almost nothing. You can just saunter out the door with it.


I'm sure who ever started this got the main db set from DHA (probably illegally). And that would quite likely be all citizens both alive and dead.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: