Data extraction from external url via php made easy

How many times you extracted data from other website to include in your site? I guess its many times. In this web 2.0 era we all mix up different type of contents to make our own. Sometimes you grab data from youtube.com, sometimes amazon.com and then you create a mesh up. All these things need little bit of data parsing knowledge. Also you need to interact with an http resource. Many PHP coders does it by curl, DOMDocument etc extensions. This job is quite tedious. To resolve this problem I created a class long time ago. Now I put them on http://github.com/shiplu. The classes I am talking about can be found on http://github.com/shiplu/dxtool.

“dxtool” stands for data extraction tools. Its very easy to use. Here I dump the README file from github.

Requirement

  • php5
  • php5-curl extension
  • php5-json extension (already included with php5)

Features

  • Extract Data from any http resource
  • Use simple regular expression to extract data
  • Hassle free http transaction
  • Supports cookie (via curl)
  • Can cache http response

Here is an example on how to use it.

<?php
require 'DataExtractor.php';
require 'WebGet.php';
$google_new_feed = 'http://news.google.com/news?pz=1&cf=all&ned=in&hl=en&output=rss';
$w = new WebGet();
$content = $w->requestContent($google_new_feed);
$dx = new DataExtractor($content);
$dx->titles = '|title>([^<]+)</title|a';
$dx->rsstitle = '|title>([^<]+)</title|';
$data = $dx->extractArray();
print_r($data);
?>

If you run it you’ll see this output. Output will be different as google news rss will change over time

Array
(
    [titles] => Array
        (
            [0] => Top Stories - Google News
            [1] => Top Stories - Google News
            [2] => Draft of Lokpal Bill discussed informally at cabinet meet - Hindustan Times
            [3] => cabinet clears Food Security Bill - Hindustan Times
            [4] => Obama hails Havel's 'moral leadership' and 'dignity' - AFP
            [5] => Philippines struggles to cope after storm leaves 650 dead - Telegraph.co.uk
            [6] => Civil nuclear liability rules balanced, India to Russia - Hindustan Times
            [7] => 'Include Muslims, expand OBC quota' - Hindustan Times
            [8] => Sadhbhavana's success answer to Gujarat detractors: Narendra Modi - Daily News & Analysis
            [9] => Female protestor's beating sparks Egypt outrage - Telegraph.co.uk
            [10] => Romney says US withdrawal from Iraq 'precipitous' - AFP
            [11] => Come clean on Chidambaram's alleged favours to ex-client: BJP to Centre - The Hindu
        )

    [rsstitle] => Top Stories - Google News
)

HOWTO: Convert 6 digit css color code to 3 digit

Okay, So you want to convert 6 digit css color codes to 3 digit. They are almost same. But 3 digit helps to remember it.  Before jumping into code let me explain what does 3 digit represents. A css color #abc means #aabbcc. No its NOT  #a0b0c0. It may appear that  the second one is more appropriate. But the fact is Its not. You can test it.   See the following table.

#aabbcc #abc #a0b0c0
     

So to convert any 6 digit css color to a 3 digit color needs some calculation. Thats why I have written a php function.

See the code bellow

function convert_color($color){
 preg_match("|#([\da-h]{2})([\da-h]{2})([\da-h]{2})|", $color, $match);
 $n=array();
 array_shift($match);
 foreach($match as $m)
  array_push($n, reduce_digit($m));
 return "#". implode("", $n);
}

function reduce_digit($hex){
 $n = hexdec($hex);
 $r = $n%17 ;
 $d = intval($n/17)+ (($r<8)?0:1);
 return dechex($d);
}

Just call the convert_color function with 6 digit css color. For example. “#bcd465”. Here # is necessary. This will return the 3 digit css color.

If you want to test it, run the following code.

$params= array(
array("#aabbcc","#abc"),
array("#112233","#123"),
array("#456789","#468"),
array("#1234fa","#13f"),
array("#000000","#000")
);

foreach($params as $param){
    $f = convert_color($param[0]);
    $e = $param[1];
    echo "Passed={$param[0]}, Expected=", $e, ", Actual=";
    echo $f, ", Status=", (($e==$f)?"SUCCESS":"FAILED"), PHP_EOL;
}

I have run it it the result is good.

Passed=#aabbcc, Expected=#abc, Actual=#abc, Status=SUCCESS
Passed=#112233, Expected=#123, Actual=#123, Status=SUCCESS
Passed=#456789, Expected=#468, Actual=#468, Status=SUCCESS
Passed=#1234fa, Expected=#13f, Actual=#13f, Status=SUCCESS
Passed=#000000, Expected=#000, Actual=#000, Status=SUCCESS

Create Next Previous Javascript Bookmarklet for Slideshow Tutorial Ebook sites

Recently I was browsing http://talks.php.net. There were bunch of slideshows about php. I work on PHP most times. So It was good for me.
The problem arise when I saw some slide shows didn’t rendered correctly in html. so there was no next, previous button. I had to change the urls to navigate next, previous pages.
Most urls were in the format http://domain.com/path/slideshow/1, where 1 is the page number. Changing it to 2 led me to the 2nd page of the sildeshow.
I wanted it to make automated. Via javascript obviously ( being a big fan of JS).

So I made two Javascript bookmarklet button. So when you click the Next button, it will go to next page. And same for Prev button.
Following are the buttons. All you have to do is just drag those button to your Browsers Bookmark bar. Thats it!

Next Page

Prev Page

Note: Not all the urls will work. See below to get more idea.

  1. http://domain.com/path/slideshow/1 Will work
  2. http://domain.com/path/slideshow/1.html Will not work
  3. http://domain.com/path/slideshow/1.php. Will not work
  4. http://domain.com/path/slideshow-1. Will work
  5. http://domain.com/path/slide-2/show Will not work
  6. http://domain.com/path/slide-2/show.php Will not work
  7. http://domain.com/path/slide-2 Will not work
  8. http://domain.com/path/slideshow#1 Will work ! !

Sometimes those buttons discussed above may not work, eg, Manual, ebook etc sites. In that case use the following two buttons.
These are quite handy if you are browsing a site where next page cannot be guessed from url.

Next URL

Prev URL

Here is the code

Next Button:

var e=document.getElementsByTagName("link");
var g="getAttribute";
var l=window.location.href;
for(i=0;i<e.length;i++){
    r=e[i][g]("rel").toLowerCase();
    h=e[i][g]("href");
    if(r=="next"){
    l=h;
    break;
    }
}

Previous Button:

var e=document.getElementsByTagName("link");
var g="getAttribute";
var l=window.location.href;
for(i=0;i<e.length;i++){
    r=e[i][g]("rel").toLowerCase();
    h=e[i][g]("href");
    if(r=="prev"){
    l=h;
    break;
    }
}