需要一些与Regex(PHP)的协助

时间:2015-12-13 19:34:46

标签: php regex

我想使用preg_replace将txt文件解析为HTML以添加格式。 文件的格式如下:

09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!

这应该被视为一个组并解析成一个表,如:

<table>
<tr><td>09:19:49 13-12-15</td><td>Sunday</td><td>Hello World</td></tr>
<tr><td>1234567</td><td>(optional)</td><td>Today is a beautiful day</td></tr>
<tr><td>1234568</td><td>(optional)</td><td>Tomorrow will be even better</td></tr>
<tr><td>1234569</td><td>(optional)</td><td>December is the best month of the year!</td></tr>
</table>

目前,我使用两个单独的preg_replacements,一个用于第一行(日期),第二个用于后续行,可以是一个或最多100个左右。但是,此文件也可以包含其他文本,需要忽略(如替换),但如果此行具有或多或少相同的格式(7位数和一些文本),它也会被格式化:

$file = preg_replace('~^\s*((\[.*\]){0,2}\d{1,2}:\d{2}:\d{2}(\[/.*\]){0,2})\s(\d{2}-\d{2}-\d{2}(\[/.*\]){0,2})\s+(?:\d{2}/\d{3}\s+|)(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)\s+(.+)$~m', '<table class="file"><tr class="entry"><td class="time">$1 $4</td><td class="day">$6</td><td class="message">$7</td></tr>', $file);
$file = preg_replace('~^\s*(.{0,11}?)\s*((\[.+?\])?\d{7}(\[/.+?\])?)\s+(.+?)$~m', '<tr class="id"><td class="optional">$1</td><td class="id">$2</td><td class="message">$5</td></tr>', $file);

如何改善这个?就像,如果我有这个内容:

09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!

Liverpool - WBA 2-2

1234570 This line should be ignored

19:29:59 13-12-15 Sunday Hello World
1234571 Today is a beautiful day
1234572 Tomorrow will be even better

所以,我想捕捉和preg_replace只有第一个块和最后一个块,从时间/日期和一些后续行开始,以7位数ID开头。

到目前为止,感谢阅读;)

1 个答案:

答案 0 :(得分:2)

我认为这可以完成你想要做的事情。

我不清楚为什么要忽略这一行:

  

1234570应忽略此行

此行符合7 digits and some text要求。

我想出的正则表达式是:

/^(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2}|\d{7})\h*([a-zA-Z]{3}day)?\h*(.+)/m

这是一个regex101演示:https://regex101.com/r/qB0gH6/1

并在PHP中使用:

$string = '09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!

Liverpool - WBA 2-2

1234570 This line should be ignored

19:29:59 13-12-15 Sunday Hello World
1234571 Today is a beautiful day
1234572 Tomorrow will be even better';
echo preg_replace('/^(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2}|\d{7})\h*([a-zA-Z]{3}day)?\h*(.+)/m', '<td>$1</td><td>$2</td><td>$3</td>', $string);

输出:

<td>09:19:49 13-12-15</td><td>Sunday</td><td>Hello World</td>
<td>1234567</td><td></td><td>Today is a beautiful day</td>
<td>1234568</td><td></td><td>Tomorrow will be even better</td>
<td>1234569</td><td></td><td>December is the best month of the year!</td>

Liverpool - WBA 2-2

<td>1234570</td><td></td><td>This line should be ignored</td>

<td>19:29:59 13-12-15</td><td>Sunday</td><td>Hello World</td>
<td>1234571</td><td></td><td>Today is a beautiful day</td>
<td>1234572</td><td></td><td>Tomorrow will be even better</td>

好的,根据您的更新,它有点复杂,但我认为这样做:

$string = '09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!

Liverpool - WBA 2-2

1234570 This line should be ignored

19:29:59 13-12-15 Sunday Hello World
1234571 Today is a beautiful day
1234572 Tomorrow will be even better';
echo preg_replace_callback('/(?:^|\n)(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2})\h+([a-zA-Z]{3}day)?\h*(.+?)\n((\d{7})\h+(.+?)(\n|$))+/', 
                    function ($matches) {
                        $lines = explode("\n", $matches[0]);
                        $theoutput = '<table><tr>';
                        foreach($lines as $line) {
                            if(preg_match('/(?:^|\n)(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2})\h+([a-zA-Z]{3}day)?\h*(.*)/', $line, $output)) {
                                //it is the first date string line;
                                foreach($output as $key => $values) {
                                    if(!empty($key)) {
                                        $theoutput .= '<td>' . $values . '</td>';
                                    }
                                }
                            } else {
                                if(preg_match('/(\d{7})\h*(.*)/', $line, $output)) {
                                    $theoutput .= '</tr><tr>';
                                    foreach($output as $key => $values) {
                                        if(!empty($key)) {
                                            $theoutput .= '<td>' . $values . '</td>';
                                        }
                                    }
                                }
                            }
                        }
                        $theoutput .= '</tr></table>';
                        return $theoutput;
                    }, $string);

输出:

<table><tr><td>09:19:49 13-12-15</td><td>Sunday</td><td>Hello World</td></tr><tr><td>1234567</td><td>Today is a beautiful day</td></tr><tr><td>1234568</td><td>Tomorrow will be even better</td></tr><tr><td>1234569</td><td>December is the best month of the year!</td></tr></table>
Liverpool - WBA 2-2

1234570 This line should be ignored
<table><tr><td>19:29:59 13-12-15</td><td>Sunday</td><td>Hello World</td></tr><tr><td>1234571</td><td>Today is a beautiful day</td></tr><tr><td>1234572</td><td>Tomorrow will be even better</td></tr></table>