It is possible to use publicly available data on state and date of birth to predict someone’s Social Security number, particularly if they were born after 1988 and in smaller states, according to an article published Monday in The Proceedings of the National Academy of Sciences.
The ability to use statistic inference to predict the sensitive data exposes the Social Security numbers to identity fraud risks on “mass scales,” the article said.
Social Security numbers “were designed as identifiers at a time when personal computers and identity theft were unthinkable; today, abused as authentication devices, they enable an ‘architecture of vulnerability,’ in which losses are incurred even in absence of fraud, because of costs caused by attempts to defend, and exploit, the system,” the article concluded.
The researchers from Carnegie Mellon University analyzed Social Security numbers of people who have died to detect statistical patterns in the assignment of numbers. They were then able to use those patterns to predict a range of values likely to include a living person’s Social Security number. Birth data, meanwhile, can be inferred from data brokers, voter registration lists, online white pages, and social-networking profiles, the report said.
The researchers identified in a single attempt the first five Social Security digits for 44 percent of the records of the people listed as dead from 1989 to 2003 and the complete Social Security numbers in fewer than 1,000 attempts for 8.5 percent of those records.
On average, the researchers matched on the first attempt the first five digits for 7 percent of all records for people born nationwide between 1973 and 1988.
“Extrapolating to the U.S. living population, this would imply the potential identification of millions of SSNs for individuals whose birth data were available,” the article says.
The report goes on to give an example of how someone could get the entire Social Security number by renting a botnet to apply for credit cards impersonating 18-year-old West Virginia-born residents. Following numerous assumptions, including that the attacker can find birth data for 50 percent of the potential targets and that inquiries with the correct first seven of nine digits are sufficient for a credit reporting agency to answer a positive match in half of the cases, an attacker could potentially harvest credentials at rates as high as 47 per minute, obtaining 4,000 credentials within two hours before the IP addresses used in the botnet were blacklisted, the article said.