I'm trying to code a bot that will check every thread's first post for links to Rapidshare, MegaUpload, EasyShare, FileFactory and SendSpace every X minutes and move the thread to a different forum if X% of the links are dead.

Thanks in advance for any hints or tips.

This is what I have so far....

This is currently running on a test forum but will be ran on a much larger scale when working.

<?php
///////////////////////////////////////////////////////////
// Script created by: Fatheed for Neopluz.org//////////////
// Use: Checking to see if links are dead or not /////////
//////////////////////////////////////////////////////////
function ss_link_check($_url,$_type="RS") {
  //RS = RapidShare; MU = MegaUpload; ES = EasyShare; FF = FileFactory; SS = SendSpace;
  $_sites=array(
  "RS" => "Error",
  "MU" => "Unfortunately, the link you have clicked is not available.",
  "ES" => "File not found",
  "FF" => "Sorry, this file is no longer available. It may have been deleted by the uploader, or has expired.",
  "SS" => "Sorry, the file you requested is not available.",
  "FFFH" => "Your requested file is not found"
  );

  $_fgc=file_get_contents($_url);
  if (preg_match("/".$_sites[$_type]."/",$_fgc) or $_fgc=="") {
    $_correct=false;
  }
  else {
    $_correct=true;
  }

  return $_correct;
}

//Include the vBulletin configuration file
include ("config.php");

//Search the DB
if ($config['MasterServer']['usepconnect']==1) {
  $_connection=mysql_pconnect(
  $config['MasterServer']['servername'].":".$config['MasterServer']['port'],
  $config['MasterServer']['username'],
  $config['MasterServer']['password']);
}
else {
  $_connection=mysql_connect(
  $config['MasterServer']['servername'].":".$config['MasterServer']['port'],
  $config['MasterServer']['username'],
  $config['MasterServer']['password']);
}

mysql_select_db(
$config['Database']['dbname'],
$_connection);

$_boards_array=array(
1 => "5",
2 => "6",
3 => "7",
4 => "8",
5 => "9",
6 => "10",
7 => "11"
);

foreach ($_boards_array as $_key => $_cboard) {
  echo("<b>Board ID:</b> ".$_cboard."<br />");

  $_query1=mysql_query('SELECT * FROM '.$config['Database']['tableprefix'].'thread WHERE forumid="'.$_cboard.'" ORDER BY threadid DESC');
  while ($_rows1=mysql_fetch_array($_query1)) {
    $_threadid=$_rows1["threadid"];
    $_firstpostid=$_rows1["firstpostid"];

$_query2=mysql_query('SELECT * FROM '.$config['Database']['tableprefix'].'post WHERE postid="'.$_firstpostid.'"');
while ($_rows2=mysql_fetch_array($_query2)) {
  $_pagetext=$_rows2["pagetext"];
  $_link=preg_match_all("@\[(?i)url\](.*?)\[/(?i)url\]@si",$_pagetext,$_url,PREG_SET_ORDER);
  $_replace=preg_replace("@\[(?i)url\](.*?)\[/(?i)url\]@si","@\[(?i)url\](.*?)\[/(?i)url\]@si",$_pagetext);
  $_not_working=0;

  foreach ($_url as $_key2 => $_site) {
    if (preg_match("/rapidshare/",$_url[$_key2][1])) {
      $_type="RS";
    }
    else if (preg_match("/megaupload/",$_url[$_key2][1])) {
      $_type="MU";
    }
    else if (preg_match("/easyshare/",$_url[$_key2][1])) {
      $_type="ES";
    }
    else if (preg_match("/filefactory/",$_url[$_key2][1])) {
      $_type="FF";
    }
    else if (preg_match("/sendspace/",$_url[$_key2][1])) {
      $_type="SS";
    }
    else if (preg_match("/fastfreefilehosting/",$_url[$_key2][1])) {
      $_type="FFFH";
    }
    else {
      $_type=0;
    }

	$message=$phper[2];
    echo($_url[$_key2][1]." / ".mysql_error($_connection)." / ".$message."<br />");
    if ($_type!==0) {
      if (ss_link_check($_url[$_key2][1],$_type)) {
        //blah
      }
      else {
	    $_not_working+=1;
      }
    }
  }

  $_delete=$_not_working;
  $_delete=$_delete*100;

  if ($_delete > 0) {
    mysql_query('UPDATE '.$config['Database']['tableprefix'].'thread SET forumid="3" WHERE threadid="'.$_threadid.'"');
	mysql_query('UPDATE '.$config['Database']['tableprefix'].'post SET pagetext="'.$_pagetext.'[quote]One or more links were found dead. If you want this moved back please report this post or PM Fatheed.[/quote]" WHERE postid="'.$_firstpostid.'"');
  }
}
  }
}
?>
    5 days later

    Well, for starters, does it work as you expect in finding "dead" links? If it doesn't then you know you need to work on the regular expressions. I'm not sure what "(?i)" is in a regular expression, I've never seen it. I'm also not sure what you're doing with the preg_replace. You're just replacing the urls with gibberish.

    I'd also take a closer look at the $url array that preg_match_all returns. I'd make sure that you know what's in it before you start manipulating it. Just a thought 😉

    Also, you don't need to use preg_match for everything. Sometimes [man]strpos/man and [man]eregi/man are enough to do what you want if it's simple (like seeing if the after the http:// of the URL that the word rapidshare or megaupload is next).

    Those are just some thoughts, hope they help.

      At the moment it only finds live links for some reason 🙁

      No job postings or solicitations for work

        Try using strpos() instead of preg_match in your ss_link_checker function. You don't need to match it, just know whether it exists in the returned string.

        You might also look into using streams so you can read the headers that are sent. Many of these sites might send 404 or other HTTP errors which denotes that what your'e looking for "isn't found".

          Write a Reply...