我有一个这样的字符串:
####################
Section One
####################
Data A
Data B
####################
Section Two
####################
Data C
Data D
etc.
我想将其分解为:
$arr(
'Section One' => array('Data A', 'Data B'),
'Section Two' => array('Data C', 'Data D')
)
起初我试过这个:
$sections = preg_split("/(\r?\n)(\r?\n)#/", $file_content);
问题是,文件不是很干净:有时在各部分之间有不同数量的空行,或者数据行之间有空格。
部分头部模式本身似乎相对一致:
####################
Section Title
####################
#的数量可能是一致的,但我不想指望它。标题行上的空白区域非常随机。
一旦我把它分成几个部分,我认为这将是非常简单的,但是任何帮助写一个杀手reg ex以获得它的任何帮助将不胜感激。 (或者,如果有比reg ex更好的方法......)
答案 0 :(得分:3)
我采取多步骤方法:
以下是一个示例,分为多行,以便您可以跟踪正在发生的事情:
注意缺乏完整性检查,这个假定漂亮,整齐的标题/内容组。
正则表达式是为了简洁而编写的,可能或可能不足以满足您的需求。
// Split string on a line of text wrapped in lines of only #'s
$parts = preg_split('/^#+$\R(.+)\R^#+$/m', $subject, null, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY);
// Tidy up leading/trailing whitespace for each heading/content-block
$parts = array_map('trim', $parts);
// Chunk into array("heading", "content")
$parts = array_chunk($parts, 2);
// Create the final array
$sections = array();
foreach ($parts as $part) {
$sections[$part[0]] = explode("\n", $part[1]);
}
// Lets take a look
var_dump($sections);
答案 1 :(得分:1)
我能够迅速写下来:
<?php
$text = <<<EOT
####################
Section One
####################
Data B.Thing=bar#
.##.#%#
####################
Empty Section!
####################
####################
Last section
####################
Blah
Blah C# C# C#
EOT;
$entries = array_chunk(
preg_split("/^#+/m", $text, null, PREG_SPLIT_NO_EMPTY),
2
);
$sections = array();
foreach ($entries as $entry) {
$key = trim($entry[0]);
$value = preg_split("/\n/", $entry[1], null, PREG_SPLIT_NO_EMPTY);
$sections[$key] = $value;
}
print_r($sections);
?>
输出为:(as run on ideone.com)
Array
(
[Section One] => Array
(
[0] => Data B.Thing=bar#
[1] => .##.#%#
)
[Empty Section!] => Array
(
)
[Last section] => Array
(
[0] => Blah
[1] => Blah C# C# C#
)
)
答案 2 :(得分:0)
任何帮助编写杀手级正则表达式的帮助都将得到赞赏
......我有个杀手级正则表达式模式-它依靠\G
(继续)元字符来匹配每个节标题后出现的多行文本。
此技术比以前的答案更为理想,因为只有一个preg_
调用和零迭代函数调用。
样本输入:
$fileContents = <<<TEXT
####################
Section One
####################
Data A
Data B
####################
Section Two
####################
Data C
Data D
Data E
####################
Section Three
####################
Data F
TEXT;
代码:(Demo)
preg_match_all(
'~(?:
^\#{3,}\R
\h*(\S+(?:\h\S+)*)\h*\R
\#{3,}
|
\G(?!\A)
)
\R
(?!\#{3,})(.+)
~mx',
$fileContents,
$out,
PREG_SET_ORDER
);
foreach ($out as $set) {
$heading = $set[1] ?: $heading;
$result[$heading][] = $set[2];
}
var_export($result ?? 'No qualifying data');
输出:
array (
'Section One' =>
array (
0 => 'Data A',
1 => 'Data B',
),
'Section Two' =>
array (
0 => 'Data C',
1 => 'Data D',
2 => 'Data E',
),
'Section Three' =>
array (
0 => 'Data F',
),
)
故障:
~ #starting pattern delimiter
(?: #start non-capturing group 1
^ #match the start of a line
\#{3,} #match 3 or more hash symbols
\R #match a newline sequence
\h* #match space or tab, zero or more times
( #start capture group 1
\S+ #match one or more non-whitespace characters
(?: #start non-capturing group 2
\h #match space or tab
\S+ #one or more non-whitespace characters
)* #end capture group 2, permit zero or more occurrences
) #end capture group 1
\h* #match space or tab, zero or more times
\R #match a newline sequence
\#{3,} #match 3 or more hash symbols
| #or
\G(?!\A) #continue matching but disallow starting from start of string
) #end non-capturing group 1
\R #match a newline sequence
(?! #start negative lookahead
\#{3,} #match 3 or more hash symbols
) #end negative lookahead
(.+) #match the whole line excluding the trailing newline characters
~ #ending pattern delimiter
m #pattern modifier: demand that ^ matches start of lines
x #pattern modifier: allow meaningless whitespaces in pattern for improved readability
...这是一个有趣的死灵书。