使用PHP在TEXT文件中查找重复字符串

时间:2014-12-28 09:06:35

标签: php recurring

我有一个文本文件,其中包含大量数据。我已经在字符语句之间提取字段名称了。

我希望我的代码可以扫描整个文本文件。由于某种原因,它会在第一次出现的字符/字符串时停止。

<?PHP

//First, open the file. Change your filename
$file = "datafile1.txt";
$handle = fopen($file, "r");
$contents = fread($handle, filesize($file));

for ($i=0; $i=100; $i+10){
    $word1='"cat_id';
    $word2='"category"';

    $a = strpos($contents, $word1);
    $b = strpos($contents, $word2);

    $between=substr($contents, $a, ($b - $a));

    echo $between;  

    //////////////////////////////////

    $word1='"category';
    $word2='"name"';

    $c = strpos($contents, $word1);
    $d = strpos($contents, $word2);

    $between=substr($contents, $c, ($d - $c));

    echo $between;  
    ////////////////////////////////////

    $word1='"name';
    $word2='"description"';

    $e = strpos($contents, $word1);
    $f = strpos($contents, $word2);

   $between=substr($contents, $e, ($f - $e));

   echo $between;  
}
fclose($handle);

?>

我收到了回复

  

“cat_id”:“16349”,“类别”:“冒险”,“名称”:“刺客信条IV黑旗 - Xbox 360”,

但它停在那里,那里有重复的cat_id和类别,以及......好吧,电脑游戏的名字。

我需要扫描整个文本文件,以便搜索重复,希望从游戏和类别的输出中获取一个列表。

*编辑:抱歉。这是需要解析的数据文件示例。

"cat_id": "16349",
  "category": "Adventure",
  "name": "Assassin's Creed IV Black Flag - Xbox 360",
  "description": "It is 1715. Pirates rule the Caribbean and have es... (visit site URLs for full      description)",
  "updated_at": 1419672679,
  "width": "139.70",
  "sem3_id": "1AEIvknN7uwqG2GcwSCMK8",
  "created_at": 1374830955,
  "platform": "Xbox 360",
  "height": "12.70",
  "length": "190.50",
  "sitedetails": [
    {
      "sku": "B00BMFIXT2",
      "latestoffers": [
        {
          "seller": "JNJ Shop",
          "lastrecorded_at": 1419672600,
          "currency": "USD",
          "firstrecorded_at": 1419672600,
          "id": "7g2fpY7BOSE0sU2oKkUkeY",
          "price": "11.00",
          "shipping": "3.99",
          "condition": "New"
        },

200 lines later.....

"cat_id": "20923",
  "category": "Games",
  "name": "Disney Infinity Starter Pack - PlayStation 3",
  "description": "Product Description Platform: PlayStation 3 | Edit... (visit site URLs for full                            description)",
  "updated_at": 1419563879,
  "width": "269.24",
  "created_at": 1375817329,
  "sem3_id": "0FIqEyeRf4SMgiYaoKC6yO",
  "platform": "PlayStation 3",
  "height": "90.93",
  "length": "358.39",
  "sitedetails": [
    {
      "sku": "7635065",
      "latestoffers": [
        {
          "seller": "BestBuy",
          "lastrecorded_at": 1419552600,
          "firstrecorded_at": 1419015000,
          "currency": "USD",
          "availability": "In stock",
          "price": "66.98",
          "id": "5EefaVFIhs2UKYA0Q0qIae",
          "condition": "New"
        },

1 个答案:

答案 0 :(得分:0)

a它实际上并没有停止。 每次您在那里提供相同的内容时,根据http://php.net/manual/en/function.strpos.php,您应该得到指定文本的相同内容。

您可能需要使用第3个参数[,int $ offset = 0]来指出下次迭代的起始位置。水木清华。像:

$a = 0;
$b = 0;   

for ($i=0; $i=100; $i+10){
    $word1='"cat_id';
    $word2='"category"';

    $a = strpos($contents, $word1, $a);
    $b = strpos($contents, $word2, $b);

如果您要使用相同的单词“cat_id”和“category”,请在迭代之外初始化它们。

用于捕捉所有事件,你最好使用“while”周期:

$catWord = '"cat_id"';
$categoryWord = '"category"';

$a = 0;
$b = 0;
while (($a = strpos($content, $catWord, $a)) !== false) {
    $b = strpos($content, $categoryWord, $b);

    $between = ....