Users were left startled as Google Drive’s automated detection systems flagged a nearly empty file for copyright infringement.
The file, according to one Drive user, contained nothing other than just the digit “1” within.
Is digit ‘1’ copyrighted?
This week, Assistant Professor at Michigan State University, Dr. Emily Dolson, Ph.D. reported seeing some odd behavior when using Google Drive.
One of the files in Dolson’s Google Drive, ‘output04.txt’ was nearly empty—with nothing other than the digit ‘1’ inside it.
But according to Google, this file violated the company’s “Copyright Infringement policy” and was hence flagged.
And what’s worse is, the warning sent to the professor ended with “A review cannot be requeste for this restriction.”
Dolson’s file ‘output04.txt’ was stored at path ‘CSE 830 Spring 2022/Testcases/Homework3/Q3/output’ in Drive which led the professor to wonder if the file path possibly contributed to the false alarm.
Present on Dolson’s “non-educational Google account,” the file was among a batch of TXTs containing output generated as part of a homework assignment.
One too many digits
A pseudonymous user also shared screenshots of their Google Drive account where files containing just the digit “1”—with or without newline characters, were flagged.
“The 1 byte files contain just ‘1’, the 2-byte file is ‘1\n’, and the 3-byte (not flagged yet) file has ‘1\r\n’,” wrote the user.
And, it turns out the behavior isn’t limited to just files containing the digit “1.”
Dr. Chris Jefferson, Ph.D., an AI and mathematics researcher at the University of St Andrews, was also able to reproduce the issue when uploading multiple computer-generated files to Drive.
Jefferson generated over 2,000 files, each containing just a number between -1000 and 1000.
The files containing the digits 173, 174, 186, 266, 285, 302, 336, 451, 500, and 833 were shortly flagged by Google Drive for copyright infringement.
Some allege that should the file contain just the digit “0,” Google would permanently disable your account, although the outcome more likely applies to users that Google deems to be repeat infringers.
“I deleted the experiment, just in case I got my account deleted for too many naughty numbers,” writes Jefferson.
Mikko Ohtamaa, founder of Defi company Capitalgram, alleged that Google’s automated style of flagging suspected copyright infringement candidates could be problematic with parts of the GDPR legislation.
Note, however, the GDPR Article 22 aka “automated individual decision-making, including profiling,” more specifically refers to making automated decisions about individuals by profiling their online behavior, such as before granting a loan or when making hiring decisions, as explained by UK’s ICO.
“I’d have more sympathy if it weren’t ‘A review cannot be requested for this restriction,'” writes HackerNews user OneLeggedCat. “It’s designed to be as brutal and draconian as possible. They chose this. It is guilty until proven innocent, with no recourse.”
It isn’t known yet what causes this behavior, and BleepingComputer has been unable to reproduce the issue at the time of writing.
In 2018, Google published a detailed document explaining how the company fights piracy. But when specifically talking about Google Drive, the report states a “full-time abuse engineering team” was set up by Google for tackling illegal streams served on Google Drive. As such, not much information is available on how Google’s algorithms process non-video content stored on Drive.
BleepingComputer reached out to Google well in advance of publishing with specific questions—such as, whether Google relied on checksums to keep track of copyrighted content and if this behavior rose from a possible hash-collision between copyrighted files and a benign ones sharing the same hash.