我从未真正用PHP(或任何语言)解析文本。我有这样的文字:
1 (2) ,Yes,5823,"Some Name
801-555-5555",EXEC,,"Mar 16, 2009",0.00,
1 (3) ,,4821,Somebody Else,MBR,,"Mar 11, 2009",,0.00
2 (1) ,,5634,Another Guy,ASSOC,,"Mar 15, 2009",,0.00
你可以看到第一行有一个休息时间,我需要得到它:
1 (2) ,Yes,5823,"Some Name 801-555-5555",EXEC,,"Mar 16, 2009",0.00,
1 (3) ,,4821,Somebody Else,MBR,,"Mar 11, 2009",,0.00
2 (1) ,,5634,Another Guy,ASSOC,,"Mar 15, 2009",,0.00
我正在考虑使用正则表达式在引号内找到\n
,或者在引用之后,因为这不会创建错误匹配,然后使用PHP的preg_replace()
替换它。我正在研究正则表达式,因为我不知道它有什么,所以我可以自己解决这个问题(这总是最好的)但毫无疑问,我当前问题的解决方案将帮助我更快地处理它
非常感谢。如果可以的话,我会立即给予奖励。
谢谢!
答案 0 :(得分:3)
如果文本具有固定格式,也许您根本不需要正则表达式,只需扫描两行双引号,如果只有一行,则开始加入行,直到找到结束行...
如果可以使用转义引号,单引号来分隔字符串等,可能会出现问题,但只要没有那种东西,你应该没问题。
我不懂PHP,所以这里有一些伪代码:
open = False
for line in lines do
nquotes = line.count("\"")
if not open then
if nquotes == 1 then
open = True
write(line)
else #we assume nquotes == 2
writeln(line)
end
else
if nquotes == 0 then
write(line)
else #we assume nquotes == 1
open = False
writeln(line)
end
end
end
答案 1 :(得分:1)
这基本上是fortran在PHP中的答案
<pre>
<?php
$data = <<<DATA
1 (2) ,Yes,5823,"Some Name
801-555-5555",EXEC,,"Mar 16, 2009",0.00,
1 (3) ,,4821,Somebody Else,MBR,,"Mar 11, 2009",,0.00
2 (1) ,,5634,Another Guy,ASSOC,,"Mar 15, 2009",,0.00
DATA;
echo $data, '<hr>';
$lines = preg_split( "/\r\n?|\n/", $data );
$filtered = "";
$open = false;
foreach ( $lines as $line )
{
if ( substr_count( $line, '"' ) & 1 && !$open )
{
$filtered .= $line;
$open = true;
} else {
$filtered .= $line . "\n";
$open = false;
}
}
echo $filtered;
?>
</pre>