loader gif

Unprotected MongoDB instance containing almost 200 million CVs exposed online

Unprotected MongoDB instance containing almost 200 million CVs exposed online
  • A MongoDB instance which contained 854GB of data, with 202,730,434 CVs of Chinese jobseekers was exposed online.
  • The exposed CVs contained personal information such as full names, dates of birth, addresses, phone numbers, email addresses, marital status, education, salary expectations, previous job experience, and more.

An open and unprotected MongoDB instance which contained 202,730,434 resumes of Chinese jobseekers was left publicly accessible for at least one week. The database held almost 854 GB of data.

On 28, December 2018, Bob Diachenko, Director of Cyber Risk Research at Hacken and bug bounty platform HackenProof, discovered the unsecured database while analyzing the data stream of BinaryEdge search engine.

What was exposed?

The exposed CVs contained personal information such as full names, dates of birth, addresses, phone numbers, email addresses, marital status, number of children, political affiliations, body measurements like height and weight, literacy level, salary expectations, education, previous job experience, and more, Diachenko explained in a blog.

Diachenko couldn't find the owner of the database so he took to Twitter to look for the owner. He tweeted, “Just came across a giant database with more than 202 Million Chinese CVs, with pretty much detailed info. - name, email, phones, genders, child counts, marriage status, politics stat (?) - and, of course, skills, work history etc. Any ideas where to report?”

Origin of the data

One of Diachenko’s follower pointed out a Github repository for an app named data-import which contained source code of the app. The app’s purpose was to scrape CVs from legitimate job-finding portals.

“One of the primary sources from where the app appears to have scraped CVs is bj.58.com, a popular Chinese job portal, however other portals could have been scraped as well,” Diachenko told ZDNet.

Diachenko contacted bj.58.com and one of its representatives confirmed that the data came from a data scraper and did not leak from its storage.

“We have searched all over the database and investigated all the other storage, and concluded that the sample data is not leaked from us,” a representative for bj.58.com said. “It seems that the data is leaked from a third party who scrapes data from many CV’s websites,” the representative concluded.

Interestingly enough, the unprotected open database was secured a week later.

loader gif