我想使用preg_replace将txt文件解析为HTML以添加格式。 文件的格式如下:
09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!
这应该被视为一个组并解析成一个表,如:
<table>
<tr><td>09:19:49 13-12-15</td><td>Sunday</td><td>Hello World</td></tr>
<tr><td>1234567</td><td>(optional)</td><td>Today is a beautiful day</td></tr>
<tr><td>1234568</td><td>(optional)</td><td>Tomorrow will be even better</td></tr>
<tr><td>1234569</td><td>(optional)</td><td>December is the best month of the year!</td></tr>
</table>
目前,我使用两个单独的preg_replacements,一个用于第一行(日期),第二个用于后续行,可以是一个或最多100个左右。但是,此文件也可以包含其他文本,需要忽略(如替换),但如果此行具有或多或少相同的格式(7位数和一些文本),它也会被格式化:
$file = preg_replace('~^\s*((\[.*\]){0,2}\d{1,2}:\d{2}:\d{2}(\[/.*\]){0,2})\s(\d{2}-\d{2}-\d{2}(\[/.*\]){0,2})\s+(?:\d{2}/\d{3}\s+|)(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)\s+(.+)$~m', '<table class="file"><tr class="entry"><td class="time">$1 $4</td><td class="day">$6</td><td class="message">$7</td></tr>', $file);
$file = preg_replace('~^\s*(.{0,11}?)\s*((\[.+?\])?\d{7}(\[/.+?\])?)\s+(.+?)$~m', '<tr class="id"><td class="optional">$1</td><td class="id">$2</td><td class="message">$5</td></tr>', $file);
如何改善这个?就像,如果我有这个内容:
09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!
Liverpool - WBA 2-2
1234570 This line should be ignored
19:29:59 13-12-15 Sunday Hello World
1234571 Today is a beautiful day
1234572 Tomorrow will be even better
所以,我想捕捉和preg_replace只有第一个块和最后一个块,从时间/日期和一些后续行开始,以7位数ID开头。
到目前为止,感谢阅读;)
答案 0 :(得分:2)
我认为这可以完成你想要做的事情。
我不清楚为什么要忽略这一行:
1234570应忽略此行
此行符合7 digits and some text
要求。
我想出的正则表达式是:
/^(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2}|\d{7})\h*([a-zA-Z]{3}day)?\h*(.+)/m
这是一个regex101演示:https://regex101.com/r/qB0gH6/1
并在PHP中使用:
$string = '09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!
Liverpool - WBA 2-2
1234570 This line should be ignored
19:29:59 13-12-15 Sunday Hello World
1234571 Today is a beautiful day
1234572 Tomorrow will be even better';
echo preg_replace('/^(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2}|\d{7})\h*([a-zA-Z]{3}day)?\h*(.+)/m', '<td>$1</td><td>$2</td><td>$3</td>', $string);
输出:
<td>09:19:49 13-12-15</td><td>Sunday</td><td>Hello World</td>
<td>1234567</td><td></td><td>Today is a beautiful day</td>
<td>1234568</td><td></td><td>Tomorrow will be even better</td>
<td>1234569</td><td></td><td>December is the best month of the year!</td>
Liverpool - WBA 2-2
<td>1234570</td><td></td><td>This line should be ignored</td>
<td>19:29:59 13-12-15</td><td>Sunday</td><td>Hello World</td>
<td>1234571</td><td></td><td>Today is a beautiful day</td>
<td>1234572</td><td></td><td>Tomorrow will be even better</td>
好的,根据您的更新,它有点复杂,但我认为这样做:
$string = '09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!
Liverpool - WBA 2-2
1234570 This line should be ignored
19:29:59 13-12-15 Sunday Hello World
1234571 Today is a beautiful day
1234572 Tomorrow will be even better';
echo preg_replace_callback('/(?:^|\n)(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2})\h+([a-zA-Z]{3}day)?\h*(.+?)\n((\d{7})\h+(.+?)(\n|$))+/',
function ($matches) {
$lines = explode("\n", $matches[0]);
$theoutput = '<table><tr>';
foreach($lines as $line) {
if(preg_match('/(?:^|\n)(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2})\h+([a-zA-Z]{3}day)?\h*(.*)/', $line, $output)) {
//it is the first date string line;
foreach($output as $key => $values) {
if(!empty($key)) {
$theoutput .= '<td>' . $values . '</td>';
}
}
} else {
if(preg_match('/(\d{7})\h*(.*)/', $line, $output)) {
$theoutput .= '</tr><tr>';
foreach($output as $key => $values) {
if(!empty($key)) {
$theoutput .= '<td>' . $values . '</td>';
}
}
}
}
}
$theoutput .= '</tr></table>';
return $theoutput;
}, $string);
输出:
<table><tr><td>09:19:49 13-12-15</td><td>Sunday</td><td>Hello World</td></tr><tr><td>1234567</td><td>Today is a beautiful day</td></tr><tr><td>1234568</td><td>Tomorrow will be even better</td></tr><tr><td>1234569</td><td>December is the best month of the year!</td></tr></table>
Liverpool - WBA 2-2
1234570 This line should be ignored
<table><tr><td>19:29:59 13-12-15</td><td>Sunday</td><td>Hello World</td></tr><tr><td>1234571</td><td>Today is a beautiful day</td></tr><tr><td>1234572</td><td>Tomorrow will be even better</td></tr></table>