Recently, I wrote about the Facebook search limitation and I also, mistakenly, said the limitation also exists in FQL. Today, I am writing to say that is incorrect and have a new script to show how to properly retrieve all posts and comments from a group. The key is keeping the time range to 4 months or less and use multiple queries to retrieve the data. Below is the new import code which works with a few limitations I will discuss below.

Be sure to update the settings section (keep override to 1 until you understand the purpose behind it)

<?php
  //Settings
  $gid = '';
  $host = '';
  $user = '';
  $pass = '';
  $db = '';
  $access_token = 'access_token=';
  $override = 1;

  //Intialize array
  $results = array();
  $time = intval(microtime(true));
  $int = 7776000;
  
  //Connect to MySQL
  $mysqli = new mysqli($host, $user, $pass, $db);
  if (mysqli_connect_errno()) {
    printf("Connect failed: %s\n", mysqli_connect_error());
    exit();
  }
  
  $result = $mysqli->query("SELECT post_time FROM fb_posts ORDER BY post_time DESC LIMIT 1")->fetch_assoc();
  $last_post = (int) $result['post_time'];
  
  if ($override == 1) $last_post = 1291847525;
  
  do {
    // Run fql query
    $fql_query_url = 'https://graph.facebook.com/'
      . 'fql?q=SELECT+post_id,+actor_id,+created_time,+message,+comments,+likes+FROM+stream+WHERE+source_id='.$gid.'+AND+created_time<'.$time.'+AND+created_time>'.($time - $int).'+LIMIT+5000'
      . '&' . $access_token;
    $fql_query_result = file_get_contents($fql_query_url);
    $fql_query_obj = json_decode(preg_replace('/("\w+"):(\d+)/', '\\1:"\\2"', $fql_query_result), true);
    $results = array_merge($results, $fql_query_obj['data']);
    $time -= $int;
  } while (count($fql_query_obj) != 0 && $time > $last_post);
    
  //Prepare statements for entry
  $update_stmt = $mysqli->prepare("SELECT num_comments FROM fb_posts WHERE post_id=? LIMIT 1");
  $update_stmt->bind_param("s",$post_id);
  $post_stmt = $mysqli->prepare("INSERT INTO fb_posts (post_id, user_id, post_body, post_time, num_comments, likes) VALUES(?,?,?,?,?,?) ON DUPLICATE KEY UPDATE likes = VALUES(likes), num_comments = VALUES(num_comments)");
  $post_stmt->bind_param("ssssss",$post_id,$user,$body,$time,$num_comments,$likes);
  $comment_stmt = $mysqli->prepare("INSERT INTO fb_comments (comment_id, post_id, user_id, comment_body, comment_time, likes) VALUES(?,?,?,?,?,?) ON DUPLICATE KEY UPDATE likes = VALUES(likes)");
  $comment_stmt->bind_param("ssssss",$comment_id,$post_id,$user,$body,$time,$likes);

  //Cycle through each post
  foreach ($results as $post) {
    list(,$post_id) = explode("_",$post["post_id"]);
    $user = $post["actor_id"]; $body = $post["message"]; $time = $post["created_time"]; $num_comments = $post["comments"]["count"]; $likes = $post["likes"]["count"];
    $update_stmt->execute();
    $update_stmt->bind_result($old_comments);
    $update_stmt->fetch();
    $update_stmt->free_result();
    $post_stmt->execute();
    if ((int) $num_comments > $old_comments) {
    var_dump($post_id);
      $query = "https://graph.facebook.com/fql?q=SELECT+likes,+id,+time,+text,+fromid+FROM+comment+WHERE+post_id=%22".$post['post_id']."%22&".$access_token;
      $comments = file_get_contents($query);
      $comments = json_decode(preg_replace('/("\w+"):(\d+)/', '\\1:"\\2"', $comments), true);
      foreach ($comments["data"] as $comment) {
        list(,,$comment_id) = explode("_",$comment["id"]);
	$user = $comment["fromid"]; $body = $comment["text"]; $time = $comment["time"]; $likes = $comment["likes"];
	$comment_stmt->execute();
      }
    }
  }
  
  $post_stmt->close();
  $comment_stmt->close();
  
  $unknowns = $mysqli->query("SELECT DISTINCT p.user_id FROM `fb_posts` AS p LEFT JOIN fb_users AS u ON u.user_id = p.user_id WHERE u.user_id IS NULL");
  $user_stmt = $mysqli->prepare("INSERT INTO fb_users VALUES(?,?)");
  $user_stmt->bind_param("ss",$user,$name);
  while( $row = $unknowns->fetch_assoc() ) {
    $query = "https://graph.facebook.com/fql?q=SELECT+name+FROM+user+WHERE+uid=".$row['user_id']."&".$access_token;
    $result = file_get_contents($query);
    $result = json_decode(preg_replace('/("\w+"):(\d+)/', '\\1:"\\2"', $result), true);
    $user = $row['user_id']; $name = $result['data'][0]['name'];
    $user_stmt->execute();
  }

  $user_stmt->close();
  $mysqli->close();
  exit();

?>

The tables are essentially unchanged since the last post but the structure is as follows:

Table structure for table `fb_comments`

CREATE TABLE IF NOT EXISTS `fb_comments` (
`comment_id` bigint(18) NOT NULL,
`post_id` bigint(18) NOT NULL,
`user_id` int(12) NOT NULL,
`comment_body` longtext NOT NULL,
`comment_time` int(11) NOT NULL,
`likes` smallint(6) NOT NULL,
UNIQUE KEY `comment_id` (`comment_id`),
KEY `user_id` (`user_id`),
FULLTEXT KEY `comment_body` (`comment_body`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Table structure for table `fb_posts`

CREATE TABLE IF NOT EXISTS `fb_posts` (
`post_id` bigint(18) NOT NULL,
`user_id` int(12) NOT NULL,
`post_body` longtext NOT NULL,
`post_time` int(11) NOT NULL,
`num_comments` smallint(6) NOT NULL,
`likes` smallint(6) NOT NULL,
UNIQUE KEY `post_id` (`post_id`),
KEY `user_id` (`user_id`),
FULLTEXT KEY `post_body` (`post_body`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Table structure for table `fb_users`

CREATE TABLE IF NOT EXISTS `fb_users` (
`user_id` int(12) NOT NULL COMMENT 'User ID',
`user_name` varchar(100) NOT NULL COMMENT 'User Full Name',
UNIQUE KEY `user_id` (`user_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Some of you might have noticed the loop for each window and realized that is inefficient since a properly constructed multi query would reduce the http requests made however when using override of 0 (quick import) it only adds posts since last import which will never be outside the time window since it is run every minute on crontab. Therefore, was not worth the effort to code a more efficient multi query. However, a full scan is needed periodically or comments will not properly update. Also note, likes on comments will not update unless the number of comments changes. This will hopefully be fixed but it is a low priority issue at the moment.

This script is being used on http://albahra.com/fb to facilitate search on the Facebook group.


2 Comments

suzie · September 26, 2012 at 2:21 am

Hi!
I’m trying to use this to download my group wall into MAMP. I ran the code and it seemed to work find– no error message– but I don’t see it in MAMP. Any clues?

Thanks!
–Suzie

Samer · October 10, 2012 at 6:05 pm

Not sure what your issue is but if you provide me with some error messages or debug info I can try and help. Feel free to email me for additional assistance.

Leave a Reply

Your email address will not be published. Required fields are marked *