Parse Query String by pure JavaScrirpt

Here is a snipped I wrote today to parse Query String by pure JavaScript. Some people uses different JavaScript libraries. But from my older days when I used to parse GET parameter by perl I had a habit of writing this algorithm. I just translated it to JavaScript.

Here goes the code.

[code language=”javascript”]
function getUrlParts(url){
// url contains your data.
var qs = url.indexOf("?");
if(qs==-1) return [];
  var fr = url.indexOf("#");
var q="";
q = (fr==-1)? url.substr(qs+1) : url.substr(qs+1, fr-qs-1);
var parts=q.split("&");
var vars={};
for(var i=0;i<parts.length; i++){
var p = parts[i].split("=");
if(p[1]){
vars[decodeURIComponent(p[0])] = decodeURIComponent(p[1]);
}else{
vars[decodeURIComponent(p[0])] = "";
}
}
// vars contain all the variables in an array.
return vars;
}
[/code]

Fastest way to find difference of very large PHP arrays

You have 2 arrays and you want to find the differences. How do you do that?

Using array_diff. Obviously using array_diff. But array_diff works well on small arrays. How about large arrays. Say arrays with elements more than 2 millions! If you ever have to find array difference in php for large array you’d find array_diff is painfully slow. Today I had to do the same thing.

I had 2 arrays of integers. In fact they are mobile numbers. I needed to find the difference of these 2 lists. Each list contained almost 2.5 millions of element. When I invoke the array_dif solution it took more than 20 minutes. And then I had to press Ctrl+C. As I am working with large arrays, I was equipped with huge memory. I started php with 4 gigabytes of memory. So the following option.

[code language=”bash”]
php -d memory_limit=4G
[/code]

So I thought to improve it a little. I flip both arrays and find differneces using array_diff_key. It took about 10 seconds. What a surprise.  Why it took so little time? Because flipping an array made its values as keys. So earlier when I was searching for values, it becomes a search for keys. And keys are hash now. Searching hash is not a big deal. So the speed boost. Here is the code.

[code language=”php”]
function flip_array_diff_key($b, $a) {
$at = array_flip($a);
$bt = array_flip($b);
$d = array_diff_key($bt, $at);
return array_keys($d);
}
[/code]

Its a small function. Only problem is I had to call array_* functions too many times.  So I modified it a little. I flipped both arrays as I had to use array_diff_keys function. If I hadn’t used array_diff_keys  may be I’d do this,

[code language=”php”]
function flip_isset_diff($b, $a) {
$at = array_flip($a);
$d = array();
foreach ($b as $i)
if (!isset($at[$i]))
$d[] = $i;
return $d;
}
[/code]

See, Its just one call to array_flip.

And here is a complete custom way to achieve this.

[code language=”php”]
function large_array_diff($b, $a) {
$at = array();
foreach ($a as $i)
$at[$i] = 1;

$d = array();

foreach ($b as $i)
if (!isset($at[$i]))
$d[] = $i;

return $d;
}
[/code]

I create a benchmark script to compare all these methods. When I run it the result was a surprise.

----------------------------------------------------
Name                            Execution Time (sec)
----------------------------------------------------
----------------------------------------------------
Complete Custom                 7.122
----------------------------------------------------
Flip one and isset              5.450
----------------------------------------------------
Flip both and array_diff_key    9.915
----------------------------------------------------

I thought using array_* would be the fastest as I was using all PHPs native extensions. But no Its the slowest. The solution with one single call to array_filp seems the fastest. And surprisingly the complete custom solution is in the middle while It seems it’d be slowest.

In the conclusion I’d say, try to use as much as language construct as possible. But when It comes to a time consuming function use a native extension.

Iterate through each valid calendar days

If you develop software like me, you may face a scenario where you need to iterate through calendar days.  These are specially needed in event management application or any other application that has some sorts of daily tracker. The main point is you want to iterate through dates.

For example, If the current date is “February 28th 2001”, next date will be  “March 1st 2001”. Obviously not “February 29th 2001”. If the current date is “February 28th 2004”, next date will be “February 29th 2004”, Not “March 1st 2004”.  Yes, that’s quite tricky.  So how do you achieve it in programming?

The solution is to use time function families found in standard C header file time.h. The main idea is to first create the time you want to show by mktime(). This is the initial time. To advance to the next day/hour/min/sec/month just convert the resultant value of mktime() to struct tm structure by localtime() or gmtime(). Then add 1 day/hour/min/sec/month to struct tm structure. Use this structure to create time again by mktime().

The above strategy works because of the following lines from manual.

     In addition to computing the calendar time, mktime() normal-
     izes  the  supplied tm structure. The original values of the
     tm_wday and tm_yday components of the structure are ignored,
     and the original values of the other components are not res-
     tricted to the ranges indicated in  the  definition  of  the
     structure

Here is a sample code that iterates through the first 200 dates from epoch.

[code language=”c”]

#include <time.h>;
#include <stdio.h>;

int main(){
struct tm *lt;
int i=200;
time_t t=1; // first second on epoch

while(i–){
lt = localtime(&t);
printf("%s\n", asctime(lt));
lt->tm_hour+=24; // adding 1 day = 24 hours
t = mktime(lt);
}
return 0;
}

[/code]

If you are working with other language you may wonder if this will apply to your language. The good news is most language has wrapper for C routines.  May be the name is different. But  most probably you have it. Just see the manual.  Same technique will work on other language too.