April 5, 2105
A client emailed me the other day asking why personal information about a former teacher was available on Google search even though she had blocked it out in a .pdf document before posting it on their website. After hearing from this teacher, my client had removed the hyperlink to the .pdf document from the webpage and had delete the document from the web hosting account, yet her email address was still displaying on page one of the Google search engine results, when the teacher's name was searched. The teacher was upset that her personal email address was available to the world and they all wondered how this happened and what to do about it.
After several emails back and forth with questions and answers, I asked her if she still had a copy of the document in question, and would she email it to me. I know that search engines can read text in .pdf files and wanted to see how the email address had been handled. When I received the document, I opened it up on Acrobat, and entered the email in question in the search box. Sure enough, the search function found the email address underneath a white box that had been used to cover up that email.
While white boxes over text will block the text from the viewer, unlike a .jpg or .gif file, the underlying text in a .pdf file is still there and can easily be read by Adobe Acrobat, search engines, and web browsers. Google found this text and displayed it in the search results even though it was covered up. The best way to remove content from a .pdf file that was created with text (not scanned) is to delete it, so it is permanently removed!
Now that the mystery of how the email displayed in Google was solved, the next question was how to remove this from Google. Well, that part was easy. Google Webmaster Tools includes a Remove URLs function. You will need a Google account and access to add the website into your account, then, just submit the URL and Google will take care of the rest.
And, yes, there is Bing, Yahoo!, and thousands of other search engines out there. Bing also has a Webmaster Tools with a URL remove tool. Plus, many of the search engines use the same databases, so once Google and Bing take it down, the others will eventually follow suite.
Assuming no one else took the file and posted it on another website, the personal data will be removed from the search engine indexes over time. However, there are websites such as WayBackMachine that archive other websites, making it possible for an instance of this file to remain in cyberspace forever!