Could malicious backdoors be hiding in your code, that otherwise appears perfectly clean to the human eye and text editors alike?
A security researcher has shed light on how invisible characters can be snuck into JavaScript code to introduce security risks, like backdoors, into your software.
Not everything is what it seems, in Unicode
Earlier this month, University of Cambridge researchers revealed a clever attack dubbed ‘Trojan Source‘ for injecting vulnerabilities into the source code, in a way that the malicious code cannot be easily detected by human reviewers.
The method works with some of the most widely used programming languages today and adversaries could use it for supply-chain attacks.
Trojan Source attack, however, leverages the ambiguity introduced by homoglyphs, and the Unicode bidirectional mechanism (Bidi)—a feature used for accommodating both left-to-right and right-to-left character sets.
This week, a researcher has disclosed how certain characters could be injected into JavaScript code to introduce invisible backdoors and security vulnerabilities.
Security researcher Wolfgang Ettlinger, who is also the Director of Certitude Consulting, surmised “what if a backdoor literally cannot be seen and thus evades detection even from thorough code reviews?”
And surely enough, it didn’t take long for Ettlinger to come up with a proof of concept (PoC) code shown below. Can you spot the invisible backdoor?
“The script implements a very simple network health check HTTP endpoint that executes ping -c 1 google.com as well as curl -s http://example.com and returns whether these commands executed successfully. The optional HTTP parameter timeout limits the command execution time,” explains the researcher in his blog post.
Turns out, the backdoor is on the following lines, where the invisible character U+3164 aka ‘Hangul Filler‘ resides.
Being a Unicode “letter,” the Hangul Filler can be trivially used as a JavaScript variable name.
In effect, this alters the logic and workflow of the two lines than what was previously understood.
“A destructuring assignment is used to deconstruct the HTTP parameters from req.query,” explains the researcher.
Previously, it appeared as if the timeout parameter was the only parameter being unpacked from the req.query attribute. But in actuality, an additional variable denoted by the invisible character is also retrieved.
“If a HTTP parameter named [invisible character] is passed, it is assigned to the invisible variable. Similarly, when the checkCommands array is constructed, this [invisible variable] is included into the array,” continues Ettlinger.
All of this means, assuming the above JavaScript code was placed on a web server, reachable at host:8080, an attacker could sneak in a GET parameter representing the invisible variable, in its URL-encoded form, to execute arbitrary code:
http://host:8080/network_health?%E3%85%A4=
“Each element in the array, the hardcoded commands as well as the user-supplied parameter, is then passed to the exec function. This function executes OS commands,” states Ettlinger.
Another interesting PoC example shared by Ettlinger in his report is a conditional statement, that leverages homoglyphs:
if(environmentǃ=ENV_PROD){
Except, the ǃ= characters are not the same as the “not equal to” operator we are used to, because ‘ǃ’ is not an exclamation mark but a Unicode character known as Alveolar click.
Once again, the result of this expression will always be ‘true’ as the environment will actually be set equal to ENV_PROD, with the interpreter almost ignoring the ‘ǃ’.
“There are many other characters that look similar to the ones used in code which may be used for such proposes (e.g. “/”, “−”, “+”, “⩵”, “❨”, “⫽”, “꓿”, “∗”). Unicode calls these characters ‘confusables‘,” states Ettlinger.
Your text editor’s mileage may vary
Depending on your development tookit, not all text editors may be able to highlight the mysterious or invisible characters.
Syntax highlighting isn’t a reliable approach as invisible characters may not be shown at all, let alone be colorized by the text editor of an IDE.
“The attack requires the IDE/text editor (and the used font) to correctly render the invisible characters,” explains Ettlinger.
“At least Notepad++ and VS Code render it correctly (in VS Code the invisible character is slightly wider than ASCII characters). The script behaves as described at least with Node 14.”
However, some IDEs do seem to be clearly highlighting these invisible characters, including JetBrains WebStorm and PhpStorm, making this attack more difficult to pull off:
Playing around with invisible Unicode characters isn’t new knowledge either.
Previously, a Rust developer tried using these characters for a prank, which failed, possibly because of Rust’s “compiler-level defenses against these glyph based attacks.”
Last month, the popular Node.js library ‘ua-parser-js’ was hijacked, with threat actors injecting code in its npm releases to install cryptominers and password stealers on the victim’s machines.
And just last week, malicious releases of hijacked ‘coa’ and ‘rc’ libraries broke React pipelines around the world, as first reported by BleepingComputer.
Luckily, these supply-chain attacks were caught because despite containing obfuscated code, it remained possible for a human reviewer or a bot to spot malicious activity—especially since software builds around the world started breaking as soon as the hijacked versions of these packages hit npm.
Similarly, in April this year, misleading ‘patches’ made by the University of Minnesota researchers that in turn introduced vulnerabilities were eventually caught by the Linux kernel maintainers.
But, what happens when threat actors are able to hide backdoors in quasi-benign code that can’t be easily spotted?
“The Cambridge team proposes restricting Bidi Unicode characters [as a solution to Trojan Source]. As we have shown, homoglyph attacks and invisible characters can pose a threat as well,” says Ettlinger.
“It might therefore be a good idea to disallow any non-ASCII characters,” advises the researcher.