Question

使用PHP和正则表达式如何从文本文件中提取数据，如突出显示的部分所示（作为示例，但想法是提取整个文件）：

我想将突出显示的部分（简短描述，LEN，TYPE，描述，SAS名称和VALUES，如果存在）放入多维数组中：

$columns = [
    [
        'Provider Category Subtype Code',
        2,
        'VARCHAR2',
        'Identifies the subtype of the provider, with..and SNFs.',
        'PRVDR_CTGRY_SBTYP_CD',
        [
            '01' => 'Short Term',
            '02' => 'Long Term',
        ],
    ],
    [
        'Provider Category Code',
        2,
        'VARCHAR2',
        'Identifies the type of provider participating in..Medicaid program.',
        'PRVDR_CTGRY_CD',
        [
            '01' => 'Hospital',
        ],
    ]
    // rest of the columns..
];

到目前为止，我有这个：

// For real file content
$str = file_get_contents('https://data.cms.gov/api/views/i4jy-dtss/files/8331bd77-e02d-42a1-b4a4-b4a3ef31655d?download=true&filename=POS_OTHER_LAYOUT_SEP17.txt');

$fileArray =  explode("\n", $str);

// Prepare columns
$columns = [];
$column = [];

// sets the start of a new column
$startOfNewColumn = false;

foreach ($fileArray as $line) {
    if (preg_match('/^\s{3}\S/m', $line) && !preg_match('/^\s{3}SHORT DESCRIPTION/m', $line)) {
        $column = [];
        $startOfNewColumn = true;
    }
}

这是the regex我正在使用。

Answer 1

由于此文件没有＆＃34;已修复＆＃34;结构/模式，用正则表达式解析它是没用的。

我做的最终解决方案是使用一堆if else语句并遍历每一行。这不是最好的事情，但这就是我解决这个问题的方法。

PHP Regex：解析复杂文本文件中的数据

1 个答案: