PHP Curl URL which has redirects to obtain the final URL

April 29, 2012

Been doing some toying around lately with various ways of scraping hidden URL's from a website that deliberately tries to hide them.

I have been using PHP and the CURL library ascertain the final URL after redirects whilst also picking up on javascript redirects etc.

This is what i've come up with so far, and it's working a dream.

function get_final_url( $url, $timeout = 5 ){
    $url = str_replace( "&", "&", urldecode(trim($url)) );

$cookie = tempnam (“/tmp”, “CURLCOOKIE”); $ch = curlinit(); curlsetopt( $ch, CURLOPTUSERAGENT, “Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1” ); curlsetopt( $ch, CURLOPTURL, $url ); curlsetopt( $ch, CURLOPTCOOKIEJAR, $cookie ); curlsetopt( $ch, CURLOPTHEADER, true); curlsetopt( $ch, CURLOPTFOLLOWLOCATION, true ); curlsetopt( $ch, CURLOPTENCODING, "" ); curlsetopt( $ch, CURLOPTRETURNTRANSFER, true ); curlsetopt( $ch, CURLOPTAUTOREFERER, true ); curlsetopt( $ch, CURLOPTCONNECTTIMEOUT, $timeout ); curlsetopt( $ch, CURLOPTTIMEOUT, $timeout ); curlsetopt( $ch, CURLOPTMAXREDIRS, 10 ); $content = curlexec( $ch ); $response = curlgetinfo( $ch ); curlclose ( $ch ); if ($response[‘http_code’] == 301 || $response[‘http_code’] == 302): iniset(“useragent”, “Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1”); $headers = getheaders($response[‘url’]); $location = ""; foreach( $headers as $value ): if ( substr( strtolower($value), 0, 9 ) == “location:” ) return getfinal_url( trim( substr( $value, 9, strlen($value) ) ) ); endforeach; endif;

if (preg_match("/window.location.replace('(.*)')/i", $content, $value) ||
        preg_match("/window.location="(.*)"/i", $content, $value) || 
        preg_match("/location.href="(.*)"/i", $content, $value) ):
    	return get_final_url ( $value[1] );
    return $response['url'];


// Simply called like so

$myfinalurl = getfinalurl(’’,10);

About me

Hello! I'm David Heward, how are you going? I'm a Senior Devops/Build Engineer, specialising in AWS & Cloud Automation. Based in London. Strong 10+ year background in Software development. Have a read of my blog. Have a look at some of my working projects. Contact me at @davehewy or on Linkedin.