aravindkumarsvg/unicode_js_obfuscation.md

## unicode_js_obfuscation.md

      
    Raw
  

              unicode_js_obfuscation.md
            
          
    Writing JavaScript Entirely with Unicode (Escapes & Homoglyphs)

JavaScript can technically be written using only Unicode escape sequences or Unicode homoglyphs.
This makes code valid to the JS engine but unreadable or misleading to humans.
It is often used in obfuscation, XSS payloads, or supply chain attacks.

1. Unicode Escapes in Identifiers

var \u0061 = 5;    // same as: var a = 5
console.log(\u0061); // prints 5

\u0061 → "a"


2. Unicode Escapes in Strings

console.log("\u0048\u0065\u006C\u006C\u006F"); // "Hello"

3. Unicode Escapes in Keywords

Even keywords can be written using escapes:
\u0069\u0066 (true) {   // "\u0069\u0066" → "if"
  console.log("unicode if");
}

4. Full "Hello World" in Unicode Escapes

\u0063\u006F\u006E\u0073\u006F\u006C\u0065\u002E\u006C\u006F\u0067(
  "\u0048\u0065\u006C\u006C\u006F\u0020\u0057\u006F\u0072\u006C\u0064"
);
Output:
Hello World


5. Full Script with Function + Return in Unicode Escapes

\u0066\u0075\u006E\u0063\u0074\u0069\u006F\u006E \u006D\u0079\u0046\u0075\u006E\u0063\u0074\u0069\u006F\u006E() {
  \u0072\u0065\u0074\u0075\u0072\u006E "\u0048\u0065\u006C\u006C\u006F\u0020\u0057\u006F\u0072\u006C\u0064";
}

\u0063\u006F\u006E\u0073\u006F\u006C\u0065\u002E\u006C\u006F\u0067(\u006D\u0079\u0046\u0075\u006E\u0063\u0074\u0069\u006F\u006E());
Output:
Hello World


6. Unicode Homoglyphs (Lookalikes)

Unicode homoglyphs are different code points that look identical.
Example with Cyrillic vs Latin:
// Looks like "password"
var рassword = "secret";  

console.log(рassword);

р (U+0440, Cyrillic) vs p (U+0070, Latin)

⚠️ They look the same, but are treated as different variables.

7. Function Hijacking with Homoglyphs

// Safe function
function login() {
  console.log("Safe login");
}

// Attacker’s homoglyph function
function logіn() {
  console.log("Evil login");
}

logіn(); // Executes attacker’s version!

і = Cyrillic small letter "і" (U+0456), not Latin i.


8. Combining Escapes + Homoglyphs

// Normal (Unicode escapes)
\u0063\u006F\u006E\u0073\u006F\u006C\u0065.\u006C\u006F\u0067("Hello");

// Attacker version
\u0063\u006F\u006E\u0073\u006F\u006C\u0065.\u006C\u006F\u0067("Stolen data: " + document.cookie);
Filters may fail if attackers swap in homoglyphs or escape sequences.

9. Common Dangerous Unicode Homoglyphs


Latin
Homoglyph
Unicode
Script


a
а
U+0430
Cyrillic


e
е
U+0435
Cyrillic


i
і
U+0456
Cyrillic


o
о
U+043E
Cyrillic


p
р
U+0440
Cyrillic


c
с
U+0441
Cyrillic


y
у
U+0443
Cyrillic


x
х
U+0445
Cyrillic


✅ Takeaways


Unicode Escapes (\uXXXX) → Make code completely unreadable but valid.
Unicode Homoglyphs → Make code look normal but behave differently.
Both together → Powerful obfuscation & exploitation trick, often seen in XSS or supply chain attacks.


JavaScript Obfuscation Cheatsheet

This document covers different ways JavaScript code can be obfuscated using string escapes, identifiers, and numbers.

1. String-Based Obfuscation

Hexadecimal

eval('\x61\x6c\x65\x72\x74(1)'); // alert(1)
Octal

eval('\141\154\145\162\164(1)'); // alert(1)
Unicode

eval('\u0061\u006c\u0065\u0072\u0074(1)'); // alert(1)
Decimal (via String.fromCharCode)

eval(String.fromCharCode(97,108,101,114,116,40,49,41)); // alert(1)
Binary + Hex + Mixed

eval(String.fromCharCode(0b1100001,0x6c,101,0o162,0x74,40,0b11,0x29)); // alert(1)

2. Identifier Obfuscation

JavaScript allows Unicode escapes inside identifiers:
var \u0061 = "Hello";
alert(\u0061); // alerts "Hello"
⚠️ Only Unicode escapes (\uXXXX) work inside identifiers. Hex (\x) and Octal are invalid here.

3. Number-Based Obfuscation

Numbers can be written in different bases:
Hexadecimal

alert(String.fromCharCode(0x61,0x6c,0x65,0x72,0x74)); // alert
Octal

alert(String.fromCharCode(0o141,0o154,0o145,0o162,0o164)); // alert
Binary

alert(String.fromCharCode(0b1100001,0b1101100,0b1100101,0b1110010,0b1110100)); // alert

4. Execution Contexts

Encodings can be executed via:
eval("...");
setTimeout("...");
Function("...")();
Examples:
setTimeout('\x61\x6c\x65\x72\x74(1)', 0);
new Function('\u0061\u006c\u0065\u0072\u0074(1)')();

✅ Summary

Strings: All encodings (hex, octal, unicode, decimal, binary) apply.
Identifiers: Only Unicode escape sequences work.
Numbers: Can obfuscate using base representations + String.fromCharCode.
Execution: eval, Function, and setTimeout allow hidden payloads.
Latin	Homoglyph	Unicode	Script
a	а	U+0430	Cyrillic
e	е	U+0435	Cyrillic
i	і	U+0456	Cyrillic
o	о	U+043E	Cyrillic
p	р	U+0440	Cyrillic
c	с	U+0441	Cyrillic
y	у	U+0443	Cyrillic
x	х	U+0445	Cyrillic
No results found