在PHP中从这个复杂的字符串中提取有意义的数据

时间:2013-10-31 19:33:07

标签: php

我正在为我的PHP应用程序收到一些结构化数据,但格式有点不可预测且难以处理。我对数据的初始格式没有发言权。我得到的是一个字符串(下面给出的样本)。

[9484,'Víctor Valdés',8,[[['accurate_pass',[15]],['touches',[42]],['saves',[4]],['total_pass',[24]],['good_high_claim',[2]],['formation_place',[1]]]],1,'GK',1,0,0,'GK',31,183,78],[1320,'Carles Puyol',7.76,[[['accurate_pass',[50]],['touches',[75]],['aerial_won',[3]],['total_pass',[55]],['total_tackle',[1]],['formation_place',[6]]]],2,'DC',5,0,0,'D(CLR)',35,178,80],[5780,'Dani Alves',8.21,[[['accurate_pass',[58]],['touches',[99]],['total_scoring_att',[1]],['total_pass',[66]],['total_tackle',[6]],['aerial_lost',[1]],['fouls',[4]],['formation_place',[2]]]],2,'DR',22,0,0,'D(CR)',30,173,64],[83686,'Marc Bartra',8.31,[[['accurate_pass',[64]],['touches',[88]],['won_contest',[1]],['total_scoring_att',[1]],['aerial_won',[1]],['total_pass',[66]],['total_tackle',[5]],['aerial_lost',[1]],['fouls',[1]],['formation_place',[5]]]],2,'DC',15,0,0,'D(C)',22,181,70],[13471,'Adriano',6.72,[[['accurate_pass',[16]],['touches',[28]],['aerial_won',[2]],['total_pass',[18]],['total_tackle',[1]],['formation_place',[3]]]],2,'DL',21,1,31,'D(CLR),M(LR)',29,172,67]

以上是5位足球运动员的数据。这就是我需要得到的:

[9484,'Víctor Valdés',8,[[['accurate_pass',[15]],['touches',[42]],['saves',[4]],['total_pass',[24]],['good_high_claim',[2]],['formation_place',[1]]]],1,'GK',1,0,0,'GK',31,183,78]

[1320,'Carles Puyol',7.76,[[['accurate_pass',[50]],['touches',[75]],['aerial_won',[3]],['total_pass',[55]],['total_tackle',[1]],['formation_place',[6]]]],2,'DC',5,0,0,'D(CLR)',35,178,80]

[5780,'Dani Alves',8.21,[[['accurate_pass',[58]],['touches',[99]],['total_scoring_att',[1]],['total_pass',[66]],['total_tackle',[6]],['aerial_lost',[1]],['fouls',[4]],['formation_place',[2]]]],2,'DR',22,0,0,'D(CR)',30,173,64]

[83686,'Marc Bartra',8.31,[[['accurate_pass',[64]],['touches',[88]],['won_contest',[1]],['total_scoring_att',[1]],['aerial_won',[1]],['total_pass',[66]],['total_tackle',[5]],['aerial_lost',[1]],['fouls',[1]],['formation_place',[5]]]],2,'DC',15,0,0,'D(C)',22,181,70]

[13471,'Adriano',6.72,[[['accurate_pass',[16]],['touches',[28]],['aerial_won',[2]],['total_pass',[18]],['total_tackle',[1]],['formation_place',[3]]]],2,'DL',21,1,31,'D(CLR),M(LR)',29,172,67]

现在,我在上面的示例中手动完成了我需要用PHP可靠地完成的工作。如您所见,每个玩家都有一组数据。为了将大字符串分成单独的玩家,我不能只用“],[”因为子字符串在每个玩家的数据中出现也是不可预测的次数。

每个玩家都有一定数量的统计数据(exact_pass,touches等),但它们并不都具有相同的统计数据。例如,玩家#1有“保存”而其他玩家没有。玩家#4有“won_contest”而其他玩家没有。没有办法知道谁将拥有哪些统计数据。这意味着我不能只计算逗号直到新玩家或类似的东西。

每个玩家在他的名字前都有一个数字,但是这个数字有一个不可预测的数字,而且无法从字符串中可能出现的其他数字中辨别出来。

我认为所有玩家经常出现的是最后一点:在最后一个封闭的括号之前总是有3个整数除以逗号。这种类型的子字符串(INT,INT,INT])似乎没有出现在任何其他情况下。也许这可能有用吗?

5 个答案:

答案 0 :(得分:1)

“硬”方法是括号计数(在PHP中不太常见,在文本解析语言中更常见)...

<?php
$str = "[9484,'Víctor Valdés',8,[[['accurate_pass',[15]],['touches',[42]],['saves',[4]],['total_pass',[24]],['good_high_claim',[2]],['formation_place',[1]]]],1,'GK',1,0,0,'GK',31,183,78],[1320,'Carles Puyol',7.76,[[['accurate_pass',[50]],['touches',[75]],['aerial_won',[3]],['total_pass',[55]],['total_tackle',[1]],['formation_place',[6]]]],2,'DC',5,0,0,'D(CLR)',35,178,80],[5780,'Dani Alves',8.21,[[['accurate_pass',[58]],['touches',[99]],['total_scoring_att',[1]],['total_pass',[66]],['total_tackle',[6]],['aerial_lost',[1]],['fouls',[4]],['formation_place',[2]]]],2,'DR',22,0,0,'D(CR)',30,173,64],[83686,'Marc Bartra',8.31,[[['accurate_pass',[64]],['touches',[88]],['won_contest',[1]],['total_scoring_att',[1]],['aerial_won',[1]],['total_pass',[66]],['total_tackle',[5]],['aerial_lost',[1]],['fouls',[1]],['formation_place',[5]]]],2,'DC',15,0,0,'D(C)',22,181,70],[13471,'Adriano',6.72,[[['accurate_pass',[16]],['touches',[28]],['aerial_won',[2]],['total_pass',[18]],['total_tackle',[1]],['formation_place',[3]]]],2,'DL',21,1,31,'D(CLR),M(LR)',29,172,67]";
$line = ',';
$paren_count = 0;
$lines = array();
for($i=0; $i<strlen($str); $i++)
{
    $line.= $str{$i};
    if($str{$i} == '[') $paren_count++;
    elseif($str{$i} == ']')
    {
        $paren_count--;
        if($paren_count == 0)
        {
            $lines[] = substr($line,1);
            $line = '';
        }
    }
}
print_r($lines);
?>

答案 1 :(得分:1)

看起来@Boundless的答案是正确的,你可以使用json_decode,但你需要对你先获得的字符串做一些事情,这看起来像是一个有效的json格式字符串。

这对我有用:

<?php
$str = "[9484,'Víctor Valdés',8,[[['accurate_pass',[15]],['touches',[42]],['saves',[4]],['total_pass',[24]],['good_high_claim',[2]],['formation_place',[1]]]],1,'GK',1,0,0,'GK',31,183,78],[1320,'Carles Puyol',7.76,[[['accurate_pass',[50]],['touches',[75]],['aerial_won',[3]],['total_pass',[55]],['total_tackle',[1]],['formation_place',[6]]]],2,'DC',5,0,0,'D(CLR)',35,178,80],[5780,'Dani Alves',8.21,[[['accurate_pass',[58]],['touches',[99]],['total_scoring_att',[1]],['total_pass',[66]],['total_tackle',[6]],['aerial_lost',[1]],['fouls',[4]],['formation_place',[2]]]],2,'DR',22,0,0,'D(CR)',30,173,64],[83686,'Marc Bartra',8.31,[[['accurate_pass',[64]],['touches',[88]],['won_contest',[1]],['total_scoring_att',[1]],['aerial_won',[1]],['total_pass',[66]],['total_tackle',[5]],['aerial_lost',[1]],['fouls',[1]],['formation_place',[5]]]],2,'DC',15,0,0,'D(C)',22,181,70],[13471,'Adriano',6.72,[[['accurate_pass',[16]],['touches',[28]],['aerial_won',[2]],['total_pass',[18]],['total_tackle',[1]],['formation_place',[3]]]],2,'DL',21,1,31,'D(CLR),M(LR)',29,172,67]";
$str = '[' . $str . ']';
$str = str_replace('\'','"', $str);


//convert string to array
$arr = json_decode($str);

//now it's a php array so you can access any value
//echo '<pre>';
//print_r( $arr );
//echo '</pre>';

echo $arr [0][1]; //prints "Victor Valdes"
?>

答案 2 :(得分:0)

尝试解析为json,然后拉出你想要的东西。假设数据以4块为单位,您可以尝试:

$arr = json_decode($str);
for($i = 0; $i < count($arr) - 3; $i += 4)
{
  $arr[] = new array($arr[$i], $arr[$i + 1], $arr[$i + 2], $arr[$i + 3]);
}

答案 3 :(得分:0)

为什么不在循环中计算[?这是一个可以帮助您入门的快速未经测试的循环。

$output = array('');
$brackets = 0;
$index = 0;
foreach (str_split($input) as $ch) {
    if ($ch == '[') {
        $brackets++;
    }

    $output[$index] .= $ch;

    if ($ch == ']') {
        $brackets--;
        if ($brackets === 0) {
            $index++;
            $output[$index] = '';
        }
    }
}

虽然不是很优雅......

答案 4 :(得分:0)

您的字符串看起来像JSON,但它不是有效的JSON,因此json_decode()将无效。

通过将字符串包装在一对[]中并用双引号替换单引号,可以将您的特定情况转换为有效的JSON:

$string = str_replace("'", '"', $your_string);
var_dump(json_decode('[' . $string . ']'));

请参阅this example

当然,最好的解决方案是确保提供有效的JSON,因为如果您的文本字符串包含例如双引号,这将很容易破解。