Publishing System Settings Logout Login Register
Learn how to scan websites for string values in this example
TutorialCommentsThe AuthorReport Tutorial
Tutorial Avatar
Rating
Add to Favorites
Posted on March 10th, 2007
3899 views
PHP Coding
Introduction

In this tutorial we are going to be grabbing HTML from websites and scanning it, rather than just give you a script which just does this i've put it into a practical sense by creating a script which will scan a page looking for an affiliate link and update an mysql table accordingly.


The SQL

Firstly we are going to have to create the table in which the affilates are stored.


CREATE TABLE `affiliates` (
  `aff_id` int(5) NOT NULL AUTO_INCREMENT',
  `aff_name` varchar(255) NOT NULL default '',
  `aff_url` varchar(255) NOT NULL default '',
  `aff_in` int(5) NOT NULL default '0',
  `aff_out` int(5) NOT NULL default '0',
  `aff_active` enum('1','0') NOT NULL default '1',
  PRIMARY KEY  (`aff_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

--
-- Dumping data for table `affiliates`
--

INSERT INTO `affiliates` VALUES (1, 'Testing', 'http://arutha.co.uk', 0, 0, '1');


There is currently one example for you in the table, this example is to scan my blog for a link however bear in mind i update my blog regularly so it may not be there forever. So you may want to enter your own test data.



Step 1 - Connecting and Retreiving

As in any MySQL and PHP script you need to connect to the database therefore we begin with the all toofamiliar connection script.
 <?
// Firstly we connect to the database.
$db_host = "";
$db_user = "";

$db_pass = "";
$db_name = "affiliatecheck";
$dba = mysql_connect($db_host,$db_user,$db_pass);
mysql_select_db ($db_name) or die ("Cannot connect to database");


After we've connected we want too get all the rows off the database so we can check each affiliate, i've also included the link structure in this line off code. Read the tips and tricks box below for details.

Most common affiliate systems record hits in and out therefore most off the time they will be using a gateway to the site or a page which will record the hit and redirect them to the main page. Its safe to assume that the script with have ?id= on the end and we want to check that the correct link is on the page  and not just a mention off the site in a news.


 // Now we check all the sites.
$yourlinkstructure = "arutha.co.uk?id=";
$sql = mysql_query("SELECT * FROM affiliates");


If you have a lt off affilates when you try using the script to update them it may time out because off the heavy processing involved, you may want to think about putting a limit in the SQL code.

Step 2 - The loop and assignment of custom variables

 
while ($result = mysql_fetch_array($sql))
{
    // Take Variables from the database
    $url = $result['aff_url'];
    $idretrieved = $result['aff_id'];


When retrieving multiple rows from a mysql database i like to use the while loop. The first statement is basically saying while there are rows we haven't looked at carry on.

The function inside the while loop will return the values from that row in an array. In this case we are retrieving this data and putting them into other variables.



Step 3 - Loading and Searching the HTML

We now want to grab the HTML from our target site, we will use the following functions to do this. We are using the file_get_contents() function to retrieve the HTML from our target site and then using the htmlspecialchars() to make it easier to find our affiliate link.

We are looking for an instance anywhere in the code for that affiliate link. My previous affiliates policies have been that my link must be on every page off the site however random affiliate boxes are become more common therefore it will search the page 3 times or until the link is found.

 
// Check to see if the site contains the link
    $i = 0;
    $linkfound = false;
    while($i != 3 or $linkfound = true)
    {
        $file = file_get_contents($url);
        $file = htmlspecialchars($file);
        if (strstr($file,$yourlinkstructure.$idretrieved))
        {
            $linkfound = true;
        }
        else
        {
            $linkfound = false;
        }
        $i = $i + 1;
    }




Step 4 - Recording the results

     
// Updating the database
    if ($linkfound == true)
    {
        // Update your database accordingly - Link has been found.
        $update = mysql_query("UPDATE affiliates set aff_active = '1' where aff_id = '$idretrieved'");
    }
    else
    {
        // Update your database accordingly - Link hasn't been found.
        $update = mysql_query("UPDATE affiliates set aff_active = '0' where aff_id = '$idretrieved'");
    }
}
?>


In this code we are going to record the result into our database. With regards to the active field 1 means yes and 0 means No.

I hope you enjoyed this tutorial and found it useful. Full code is on the last page.

Arutha



<?
// Firstly we connect to the database.
$db_host = "localhost";
$db_user = "root";
$db_pass = "";
$db_name = "affiliatecheck";
$dba = mysql_connect($db_host,$db_user,$db_pass);
mysql_select_db ($db_name) or die ("Cannot connect to database");

// Now we check all the sites.
$yourlinkstructure = "arutha.co.uk?id=";
$sql = mysql_query("SELECT * FROM affiliates");
while ($result = mysql_fetch_array($sql))
{
    // Take Variables from the database
    $url = $result['aff_url'];
    $idretrieved = $result['aff_id'];
    // Check to see if the site contains the link
    $i = 0;
    $linkfound = false;
    while($i != 3 or $linkfound = true)
    {
        $file = file_get_contents($url);
        $file = htmlspecialchars($file);
        if (strstr($file,$yourlinkstructure.$idretrieved))
        {
            $linkfound = true;
        }
        else
        {
            $linkfound = false;
        }
        $i = $i + 1;
    }
    // Updating the database
    if ($linkfound == true)
    {
        // Update your database accordingly - Link has been found.
        $update = mysql_query("UPDATE affiliates set aff_active = '1' where aff_id = '$idretrieved'");
    }
    else
    {
        // Update your database accordingly - Link hasn't been found.
        $update = mysql_query("UPDATE affiliates set aff_active = '0' where aff_id = '$idretrieved'");
    }
}
?>
Dig this tutorial?
Thank the author by sending him a few P2L credits!

Send
Arutha

As much as i love the default message i want to just say hello and to tell you to visit my blog :)
View Full Profile Add as Friend Send PM
Pixel2Life Home Advanced Search Search Tutorial Index Publish Tutorials Community Forums Web Hosting P2L On Facebook P2L On Twitter P2L Feeds Tutorial Index Publish Tutorials Community Forums Web Hosting P2L On Facebook P2L On Twitter P2L Feeds Pixel2life Homepage Submit a Tutorial Publish a Tutorial Join our Forums P2L Marketplace Advertise on P2L P2L Website Hosting Help and FAQ Topsites Link Exchange P2L RSS Feeds P2L Sitemap Contact Us Privacy Statement Legal P2L Facebook Fanpage Follow us on Twitter P2L Studios Portal P2L Website Hosting Back to Top