文本文件到条件php数组

时间:2018-07-03 10:42:04

标签: php

我有一个包含成千上万个条目的文件,试图将其转换为PHP数组,但是由于遇到了需要进入数组的条件,因此遇到了绊脚石。好消息是数据是可预测的,并且有两种类型的条目1)撤销2)撤销原因

已撤销#1的参赛示例

    Serial Number: 0E76BE532946EFE890376F0339329A62
        Revocation Date: Jun 27 14:46:26 2018 GMT

#2的进入示例已被撤销原因

    Serial Number: 0E17C9648FF25C0FC537D97958E4D449
        Revocation Date: Jun 27 14:48:07 2018 GMT
        CRL entry extensions:
            X509v3 CRL Reason Code: 
                Key Compromise

如果被撤销,则总共有5行,否则只有2行。

数据文件data.txt的示例

这是来自数千个条目列表的数据样本,我们可以将其用作样本数据文件。

    Serial Number: 0E76BE532946EFE890376F0339329A62
        Revocation Date: Jun 27 14:46:26 2018 GMT
    Serial Number: 0E17C9648FF25C0FC537D97958E4D449
        Revocation Date: Jun 27 14:48:07 2018 GMT
        CRL entry extensions:
            X509v3 CRL Reason Code: 
                Key Compromise
    Serial Number: 06BB119BAA2ABC21F92B06ED8E14B113
        Revocation Date: Jun 27 14:49:12 2018 GMT
        CRL entry extensions:
            X509v3 CRL Reason Code: 
                Key Compromise
    Serial Number: 088925C97AC5991CDF5416D07FC5DB00
        Revocation Date: Jun 27 15:50:51 2018 GMT
    Serial Number: 091E2B2090C7F5DBBCC97EA958B110BC
        Revocation Date: Jun 27 15:52:31 2018 GMT
    Serial Number: 0E6E9D1E9818221538EA6AF16A279C89
        Revocation Date: Jun 27 15:53:12 2018 GMT
        CRL entry extensions:
            X509v3 CRL Reason Code: 
                Key Compromise
    Serial Number: 07852DF7D7DD35080DE3604836408ADE
        Revocation Date: Jun 27 15:53:38 2018 GMT
    Serial Number: 0DEA14237257A6A3049F934840DC2B47
        Revocation Date: Jun 27 15:53:40 2018 GMT
        CRL entry extensions:
            X509v3 CRL Reason Code: 
                Key Compromise

预期产量

我想用以下输出构建一个数组

Array
(
    [0] => Array
        (
            [serial] => 0E76BE532946EFE890376F0339329A62
            [date] => Jun 27 14:46:26 2018 GMT
        )

    [1] => Array
        (
            [serial] => 0E17C9648FF25C0FC537D97958E4D449
            [date] => Jun 27 14:48:07 2018 GMT
            [reason] => Key Compromise
        )
   ...
   ...
 )

尝试失败

这是我的尝试,仅在考虑第一个条件(#1)的情况下进行。对于(#2),它有多余的行,但无法弄清楚如何将它们考虑在内。

$arr = array();
$lines = file('data.txt', FILE_IGNORE_NEW_LINES);
$x = 0;
foreach ($lines as $line) {
    if (strpos($line, 'Serial Number: ') !== false) {
        $arr[$x]['serial'] = str_replace('Serial Number: ', '', trim($line)) ;
    }
    if (strpos($line, 'Revocation Date: ') !== false) {
        $arr[$x]['date'] = str_replace('Revocation Date: ', '', trim($line)) ;
        $x++;
    }
}

3 个答案:

答案 0 :(得分:1)

这是基于字符串操作的简单解决方案:

输入:

Serial Number: 0E76BE532946EFE890376F0339329A62
    Revocation Date: Jun 27 14:46:26 2018 GMT
Serial Number: 0E17C9648FF25C0FC537D97958E4D449
    Revocation Date: Jun 27 14:48:07 2018 GMT
    CRL entry extensions:
        X509v3 CRL Reason Code: 
            Key Compromise
Serial Number: 06BB119BAA2ABC21F92B06ED8E14B113
    Revocation Date: Jun 27 14:49:12 2018 GMT
    CRL entry extensions:
        X509v3 CRL Reason Code: 
            Key Compromise
Serial Number: 088925C97AC5991CDF5416D07FC5DB00
    Revocation Date: Jun 27 15:50:51 2018 GMT
Serial Number: 091E2B2090C7F5DBBCC97EA958B110BC
    Revocation Date: Jun 27 15:52:31 2018 GMT
Serial Number: 0E6E9D1E9818221538EA6AF16A279C89
    Revocation Date: Jun 27 15:53:12 2018 GMT
    CRL entry extensions:
        X509v3 CRL Reason Code: 
            Key Compromise
Serial Number: 07852DF7D7DD35080DE3604836408ADE
    Revocation Date: Jun 27 15:53:38 2018 GMT
Serial Number: 0DEA14237257A6A3049F934840DC2B47
    Revocation Date: Jun 27 15:53:40 2018 GMT
    CRL entry extensions:
        X509v3 CRL Reason Code: 
            Key Compromise

PHP代码:

<?php
// Extract the lines.
$file = file($filename, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);

//
$output = array();
foreach ($file as $row) {
    if (strpos($row, "Serial Number") === false) {
        $n = (count($output)-1);
        if (strpos($row, "Revocation Date") !== false) {
            $date = $row;
            $date = str_replace('Revocation Date: ', ' ', $date);
            $output[$n]['date'] = $date;
        } else if (strpos($row, "CRL entry extensions") !== false) {
        } else if (strpos($row, "X509v3 CRL Reason Code") !== false) {
        } else {
            $output[$n]['reason'] = $row;
        }   
    } else {
        $sn = $row;
        $sn = str_replace('Serial Number: ', ' ', $sn);
        $output[] = array();
        $n = (count($output)-1);
        $output[$n]['serial'] = $sn;
        $n++;
    }   
    echo $row.'</br>';
}

print_r($output);
?>

输出:

Array ( 
    [0] => Array ( 
        [serial] => 0E76BE532946EFE890376F0339329A62 
        [date] => Jun 27 14:46:26 2018 GMT 
    ) 
    [1] => Array ( 
        [serial] => 0E17C9648FF25C0FC537D97958E4D449 
        [date] => Jun 27 14:48:07 2018 GMT 
        [reason] => Key Compromise 
    ) 
    [2] => Array ( 
        [serial] => 06BB119BAA2ABC21F92B06ED8E14B113 
        [date] => Jun 27 14:49:12 2018 GMT 
        [reason] => Key Compromise 
    ) 
    [3] => Array ( 
        [serial] => 088925C97AC5991CDF5416D07FC5DB00 
        [date] => Jun 27 15:50:51 2018 GMT 
    ) 
    [4] => Array ( 
        [serial] => 091E2B2090C7F5DBBCC97EA958B110BC 
        [date] => Jun 27 15:52:31 2018 GMT
    ) 
    [5] => Array (
        [serial] => 0E6E9D1E9818221538EA6AF16A279C89 
        [date] => Jun 27 15:53:12 2018 GMT 
        [reason] => Key Compromise
    ) 
    [6] => Array ( 
        [serial] => 07852DF7D7DD35080DE3604836408ADE 
        [date] => Jun 27 15:53:38 2018 GMT
    ) 
    [7] => Array (
        [serial] => 0DEA14237257A6A3049F934840DC2B47 
        [date] => Jun 27 15:53:40 2018 GMT 
        [reason] => Key Compromise
    )
)

答案 1 :(得分:0)

根据您正在使用的文本文件的大小以及对正则表达式的适应程度,可以使用一种模式来提取要查找的不同信息。

我整理了一个简短的概念证明,适用于您提供的示例:

$re = '/\W+Serial Number: (?<serial>.*?)$\n\W+Revocation Date: (?<date>.*?)$((?:(?!Serial Number)[\n]*.)+Code: \n\W+(?<reason>.*?$))?/m';

$str = '    Serial Number: 0E76BE532946EFE890376F0339329A62
        Revocation Date: Jun 27 14:46:26 2018 GMT
    Serial Number: 0E17C9648FF25C0FC537D97958E4D449
        Revocation Date: Jun 27 14:48:07 2018 GMT
        CRL entry extensions:
            X509v3 CRL Reason Code: 
                Key Compromise
    Serial Number: 06BB119BAA2ABC21F92B06ED8E14B113
        Revocation Date: Jun 27 14:49:12 2018 GMT
        CRL entry extensions:
            X509v3 CRL Reason Code: 
                Key Compromise
    Serial Number: 088925C97AC5991CDF5416D07FC5DB00
        Revocation Date: Jun 27 15:50:51 2018 GMT
    Serial Number: 091E2B2090C7F5DBBCC97EA958B110BC
        Revocation Date: Jun 27 15:52:31 2018 GMT
    Serial Number: 0E6E9D1E9818221538EA6AF16A279C89
        Revocation Date: Jun 27 15:53:12 2018 GMT
        CRL entry extensions:
            X509v3 CRL Reason Code: 
                Key Compromise
    Serial Number: 07852DF7D7DD35080DE3604836408ADE
        Revocation Date: Jun 27 15:53:38 2018 GMT
    Serial Number: 0DEA14237257A6A3049F934840DC2B47
        Revocation Date: Jun 27 15:53:40 2018 GMT
        CRL entry extensions:
            X509v3 CRL Reason Code: 
                Key Compromise';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

// Print the entire match result
var_dump($matches);

您可以在以下位置看到此示例的运行:https://regex101.com/r/7iSBrx/1

此示例使用命名组来促进从匹配中提取所需目标,并且还有助于说明目标捕获在模式中发生的位置。如果有帮助,我很乐意解释为什么该模式有效。

作为警告,这将需要将整个文件加载到单个字符串中,如果文件很大,则可能会占用大量内存。您基于迭代的方法最适合于非常大的文件。

答案 2 :(得分:0)

尝试此代码:

 $file_handle = fopen("data.txt", "rb");

    while (!feof($file_handle) ) {

    $line_of_text = fgets($file_handle);
    $parts = explode('=', $line_of_text);


     $name =array($line_of_text);
    print_r($name);
    }

    fclose($file_handle);