Page Scraping

This reflection in the mirror as I was about to pre-trip my bus this morning has nothing to do with this post. I just thought I’d share how cool it is to work here :-)

I wanted to show the headlines from Google Hot Searches here on the blog, so I whipped up some web-page scraping code that grabs the links from that page, dumps them into arrays and I take it from there. Here’s the result: Hot Trends.

Then, I tried the same trick with a Pinterest clone that has photos which link to content, and it works well: Gentlemint. Finally, I tackled the Drudge Report.

These three apps are available on the right under Functions. The content is dynamic and changes regularly. Just three more reasons to make my blog your new home page. :-)

Here’s the PHP global regular expression code that perfoms the magic:

$page = file_get_contents('http://www.google.com/trends/hottrends/atom/hourly');
preg_match_all('(<&#97 href="(.+)">(.*)<&#47&#97>)siU', $page, $matches);

Sweet huh! (Yea, like you really needed to see that :-)

Update: If you would like to see how this works, check out the Scraper code here…

Published
Categorized as Site Stuff

3 comments

Leave a comment