如何在字符串中获取特定的String并使用PHP转换为Array

时间:2014-08-11 00:43:40

标签: php regex parsing preg-match

如何使用PHP在字符串中获取特定字符串?

我试图使用preg_match,但结果不是我想要的。

示例字符串:

  

创作者:Adobe InDesign CC(Macintosh)制片人:Adobe PDF Library 10.0.1 CreationDate:Fri Aug 8 10:37:26 2014 ModDate:Fri Aug 8 10:37:29 2014 Tagged:no Form:none Pages:2加密:无页面大小:612.283 x 858.898分(旋转0度)MediaBox:0.00 0.00 612.28 858.90裁剪框:0.00 0.00 612.28 858.90 BleedBox:0.00 0.00 612.28 858.90 TrimBox:8.50 8.50 603.78 850.39 ArtBox:0.00 0.00 612.28 858.90文件大小:28176860字节优化:没有PDF版本:1.6

这是pdfinfo test.pdf comandline的结果:

我想要的是获取这样的特定字符串:

  

MediaBox:0.00 0.00 612.28 858.90 CropBox:0.00 0.00 612.28 858.90 BleedBox:0.00 0.00 612.28 858.90 TrimBox:8.50 8.50 603.78 850.39 ArtBox:0.00 0.00 612.28 858.90

并将其放入数组列表中。结果将是这样的:

[
'Mediabox'  => [0.00,0.00,612.28,858.90],
'CropBox'   => [0.00 0.00 612.28 858.90],
'BleedBox'  => [0.00 0.00 612.28 858.90],
'TrimBox'   => [8.50 8.50 603.78 850.39]
]

3 个答案:

答案 0 :(得分:2)

一种方法,在空格处爆炸字符串

$str = "MediaBox: 0.00 0.00 612.28 858.90 CropBox: 0.00 0.00 612.28 858.90 BleedBox: 0.00 0.00 612.28 858.90 TrimBox: 8.50 8.50 603.78 850.39 ArtBox: 0.00 0.00 612.28 858.90"; 
$array = array(); 

foreach (explode(" ",$str) as $value)
{
    if (!is_numeric($value))
        $box = substr($value, 0, -1);
    else
        $array[$box][] = $value;
}

答案 1 :(得分:2)

我测试了这段代码,它似乎产生了你的预期输出:     

$str = "Creator: Adobe InDesign CC (Macintosh) Producer: Adobe PDF Library 10.0.1 CreationDate: Fri Aug 8 10:37:26 2014 ModDate: Fri Aug 8 10:37:29 2014 Tagged: no Form: none Pages: 2 Encrypted: no Page size: 612.283 x 858.898 pts (rotated 0 degrees) MediaBox: 0.00 0.00 612.28 858.90 CropBox: 0.00 0.00 612.28 858.90 BleedBox: 0.00 0.00 612.28 858.90 TrimBox: 8.50 8.50 603.78 850.39 ArtBox: 0.00 0.00 612.28 858.90 File size: 28176860 bytes Optimized: no PDF version: 1.6";

$matches = array();
$count   = preg_match_all("/(MediaBox|CropBox|BleedBox|TrimBox):\s([0-9]+\.[0-9]+\s[0-9]+\.[0-9]+\s[0-9]+\.[0-9]+\s[0-9]+\.[0-9]+)/", $str, $matches, PREG_PATTERN_ORDER);

header("Content-Type: text/plain;charset=UTF-8");

array_shift($matches);

$str_keys   = $matches[0];
$str_values = $matches[1];
$result     = array();

for ($i = 0; $i < $count; ++$i) {
  $result[$str_keys[$i]] = explode(" ", $str_values[$i]);
}

echo json_encode($result, JSON_PRETTY_PRINT);

输出:

{
    "MediaBox": [
        "0.00",
        "0.00",
        "612.28",
        "858.90"
    ],
    "CropBox": [
        "0.00",
        "0.00",
        "612.28",
        "858.90"
    ],
    "BleedBox": [
        "0.00",
        "0.00",
        "612.28",
        "858.90"
    ],
    "TrimBox": [
        "8.50",
        "8.50",
        "603.78",
        "850.39"
    ]
}

希望这有帮助。

答案 2 :(得分:0)

使用以下正则表达式,您将获得每个匹配的两个捕获组。第一个是关联数组中的键,第二个是值。

以下是preg_match中使用的正则表达式:

(MediaBox|CropBox|BleedBox|TrimBox):((?: (?:\d+(?:\.\d{1,2}))){3})