HTTP Cookies from VirtualBox are not sent back

VirtualBox is a great virualization solution. I use it to host my website and test it. It helps it me to set up the very same environment as I use in the server.  So I dont have to worry whether recent change in the web application will break it.  If you are a web developer and not using VirtualBox you should start right now.

Today I faced a weird problem. I could not log in to the web application in the vbox. But I could log in the live server. There was no difference between these two. One is physical server and other was virtual. After observing the http headers carefully I found that php session ids sent from virtual box was not preserved. But for live sever they get preserved. Session id is usually saved in cookie. Its the http clients responsibility to save the cookie and send it back along with successive request. I tested it in curl. it was not saving cooking. Google chrome was also not saving cookie. Only Firefox was saving.

At first I though its a problem of Google Chrome. I was almost submiting a bug to Chrome team. But then I tested in curl and it was not working. Two clients can not have same bug. So this should a problem of my host.  I compared all the headers sent by both live server and virtual box server side by side. And guess what I found?  The expires time for a cookie sent by virutal server was in past time. So this cookie was expired when generated.  It means my virtual box servers time was not in sync. I have to synchronize it with time server. The following command is enough for this.

ntpdate pool.ntp.org

After this everything was working smooth.  I always sync the time when i start the vbox server. If you boot your server time will be automatically synchronized. But if you save the state and later resume it you have to synchronise it manually. I never missed synchronizing. Today I forgot it. So I never think about it.  I checked last 30 revision from my svn repository to track down the problem.

My suggestion, Always synchronize the time of a vbox server if you resume it. Use the command above for this.

Now a new question arise. Why Firefox used a expired cookie?  I’ll verify it later.

Data extraction from external url via php made easy

How many times you extracted data from other website to include in your site? I guess its many times. In this web 2.0 era we all mix up different type of contents to make our own. Sometimes you grab data from youtube.com, sometimes amazon.com and then you create a mesh up. All these things need little bit of data parsing knowledge. Also you need to interact with an http resource. Many PHP coders does it by curl, DOMDocument etc extensions. This job is quite tedious. To resolve this problem I created a class long time ago. Now I put them on http://github.com/shiplu. The classes I am talking about can be found on http://github.com/shiplu/dxtool.

“dxtool” stands for data extraction tools. Its very easy to use. Here I dump the README file from github.

Requirement

  • php5
  • php5-curl extension
  • php5-json extension (already included with php5)

Features

  • Extract Data from any http resource
  • Use simple regular expression to extract data
  • Hassle free http transaction
  • Supports cookie (via curl)
  • Can cache http response

Here is an example on how to use it.

<?php
require 'DataExtractor.php';
require 'WebGet.php';
$google_new_feed = 'http://news.google.com/news?pz=1&cf=all&ned=in&hl=en&output=rss';
$w = new WebGet();
$content = $w->requestContent($google_new_feed);
$dx = new DataExtractor($content);
$dx->titles = '|title>([^<]+)</title|a';
$dx->rsstitle = '|title>([^<]+)</title|';
$data = $dx->extractArray();
print_r($data);
?>

If you run it you’ll see this output. Output will be different as google news rss will change over time

Array
(
    [titles] => Array
        (
            [0] => Top Stories - Google News
            [1] => Top Stories - Google News
            [2] => Draft of Lokpal Bill discussed informally at cabinet meet - Hindustan Times
            [3] => cabinet clears Food Security Bill - Hindustan Times
            [4] => Obama hails Havel's 'moral leadership' and 'dignity' - AFP
            [5] => Philippines struggles to cope after storm leaves 650 dead - Telegraph.co.uk
            [6] => Civil nuclear liability rules balanced, India to Russia - Hindustan Times
            [7] => 'Include Muslims, expand OBC quota' - Hindustan Times
            [8] => Sadhbhavana's success answer to Gujarat detractors: Narendra Modi - Daily News & Analysis
            [9] => Female protestor's beating sparks Egypt outrage - Telegraph.co.uk
            [10] => Romney says US withdrawal from Iraq 'precipitous' - AFP
            [11] => Come clean on Chidambaram's alleged favours to ex-client: BJP to Centre - The Hindu
        )

    [rsstitle] => Top Stories - Google News
)