Researcher Releases Tool to Unravel Pixelated Text

Ever shared photos on social media and taken the precaution of redacting potentially sensitive data such as vehicle license plates, ID numbers, or addresses? That’s wise of you.

But if you have done the redaction by pixelation, then you might want to consider taking down those photos as soon as possible.

It turns out that it is possible to reconstruct text from pixelated images. And a researcher has just released the tools he created for this on GitHub.

Defeating pixelation

In a lengthy blog post, Dan Petro, a lead researcher at security firm Bishop Fox outlined how pixelation works and how it can be defeated.

To be clear, an existing tool called Depix already does this using a brute force approach. However, it doesn’t cope well with minor variations and noise and appears to be easily stumped in real-world settings.

“In real-world examples, you’re likely to get minor variations and noise that throws a wrench into the gears,” explained Petro, who chanced upon a blog outlining the issues with Depix by a researcher at cybersecurity consultancy Jumpsec.

The Jumpsec researcher also issued a pixelated text challenging readers with GPUs that are “beefy enough” to de-obfuscate, which Petro took up.

According to Petro, some of the challenges with pixelated text include the bleeding over of character into adjacent grids, as well as the unexpected effect of whitespace and variable-width fonts.

In addition, it turns out that different rendering engines produce slightly different images of the same font. Nevertheless, Petro successfully deciphered the text, though he only posted the first four letters at the request of Jumpsec.

The takeaway? When publishing images online, be sure to overlay sensitive information with opaque shapes rather than pixelation. While the latter looks nice, it falls flat when it comes to hiding the information from anyone more than casual browsers.

It is worth noting that the approach Petro adopted works on a letter-by-letter basis. With a machine learning approach that takes entire paragraphs and natural language processing into consideration, swiftly deciphering entire pages of pixelated text is likely to be completely achievable.

“The bottom line is that when you need to redact text, use black bars covering the whole text. Never use anything else. No pixelization, no blurring, no fuzzing, no swirling,” wrote Petro.

“The last thing you need after making a great technical document is to accidentally leak sensitive information because of an insecure redaction technique.”

Image credit: iStockphoto/lolostock