我目前正在尝试使用preg_match()在PHP中处理csv文件。我想要处理的数据示例如下:
“SN120187”,“Aldersr Rd Nr Shops”,“”,“STHPTN”,“50 56.4241N”,“1 25.7587W”,“1001077307”,“2010-05-30 15:29:49”, “10”, “”, “SURRSHLT3x32”, “BSU243L1”, “iiipiiipiiipiiipiii”,
“HA035028”,“Hursley Road - Leigh House Hospital”,“”,“HURSLEY”,“50 59.6772N”,“1 23.4412W”,“”,“”,“24”,“”,“” ,“”,“快速的棕色狐狸跳过懒狗快速的棕色狐狸跳过懒狗”,
我有一个正则表达式,我试图在这个数据上使用(下面);
if(preg_match('/^"(?P<code>.+)","(?P<description>.+)","(?P<bay>.*)","(?P<area>.+)","(?P<lat>.+)","(?P<lon>.+)","(?P<build>.*)","(?P<msgTime>.*)","(?P<routes>.*)","(?P<simNo>.*)","(?P<displayType>.*)","(?P<version>.*)","(?P<comments>.*)",$/', $line, $matches)){}
正则表达式对95%的数据起作用,但是,不起作用的数据使csv行中的最后一个字段为非空。
我开始玩数据,(主要是最后一个字段),发现以下数据不会通过正则表达式;
“SN120187”,“Aldersr Rd Nr Shops”,“”,“STHPTN”,“50 54.5512N”,“1 22.9273W”,“1001077307”,“2010-05-30 15:29:49”, “10”, “”, “SURRSHLT3x32”, “BSU243L1”, “iiiipiiiipiiiipiiii”,
“HA035028”,“Hursley Road - Leigh House Hospital”,“”,“HURSLEY”,“52 58.3498N”,“1 26.5421W”,“”,“”,“24”,“”,“” , “”, “iiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipii”,
但是,如果从上面的数据中删除最后一个字段中的一个字符,它将通过。从玩弄它,我发现没有一致的模式来获得这个错误;字符串的总长度似乎并不重要(通过向其他字段添加额外字符来显示),并且最终字段的长度也无关紧要。
我不知道发生了什么事。有没有人有任何想法?
我目前正在运行PHP版本5.3.2,并且没有出现错误消息。
答案 0 :(得分:2)
如果这是CSV数据,请使用str_getcsv
等CSV处理函数作为字符串,或使用fgetcsv
来读取文件。
答案 1 :(得分:0)
我在本地尝试了它,它和你描述的相同,我有PHP 5.2.10-2ubuntu6
。
首先尝试,我删除了您的模式的"(?P<comments>.*)",
:
$line='"HA035028","Hursley Road - Leigh House Hospital","","HURSLEY","52 58.3498N","1 26.5421W","","","24","","","","iiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipii",';
$r=preg_match('/^"(?P<code>.+)","(?P<description>.+)","(?P<bay>.*)","(?P<area>.+)","(?P<lat>.+)","(?P<lon>.+)","(?P<build>.*)","(?P<msgTime>.*)","(?P<routes>.*)","(?P<simNo>.*)","(?P<displayType>.*)","(?P<version>.*)",$/', $line, $matches);
var_dump($r, $matches);
输出:
int(1)
array(25) {
[0]=>
string(169) ""HA035028","Hursley Road - Leigh House Hospital","","HURSLEY","52 58.3498N","1 26.5421W","","","24","","","","iiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipii","
["code"]=>
string(8) "HA035028"
[1]=>
string(8) "HA035028"
["description"]=>
string(35) "Hursley Road - Leigh House Hospital"
[2]=>
string(35) "Hursley Road - Leigh House Hospital"
["bay"]=>
string(0) ""
[3]=>
string(0) ""
["area"]=>
string(7) "HURSLEY"
[4]=>
string(7) "HURSLEY"
["lat"]=>
string(11) "52 58.3498N"
[5]=>
string(11) "52 58.3498N"
["lon"]=>
string(13) "1 26.5421W",""
[6]=>
string(13) "1 26.5421W",""
["build"]=>
string(0) ""
[7]=>
string(0) ""
["msgTime"]=>
string(2) "24"
[8]=>
string(2) "24"
["routes"]=>
string(0) ""
[9]=>
string(0) ""
["simNo"]=>
string(0) ""
[10]=>
string(0) ""
["displayType"]=>
string(0) ""
[11]=>
string(0) ""
["version"]=>
string(57) "iiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipii"
[12]=>
string(57) "iiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipii"
}
请注意,<version>
现在与最后一个字段匹配,而<lon>
匹配两个字段
第二次尝试;我用.
替换了每个[^"]
次出现:
$line='"HA035028","Hursley Road - Leigh House Hospital","","HURSLEY","52 58.3498N","1 26.5421W","","","24","","","","iiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipii",';
$r=preg_match('/^"(?P<code>[^"]+)","(?P<description>[^"]+)","(?P<bay>[^"]*)","(?P<area>[^"]+)","(?P<lat>[^"]+)","(?P<lon>[^"]+)","(?P<build>[^"]*)","(?P<msgTime>[^"]*)","(?P<routes>[^"]*)","(?P<simNo>[^"]*)","(?P<displayType>[^"]*)","(?P<version>[^"]*)","(?P<comments>[^"]*)",$/', $line, $matches);
输出:
int(1)
array(27) {
[0]=>
string(169) ""HA035028","Hursley Road - Leigh House Hospital","","HURSLEY","52 58.3498N","1 26.5421W","","","24","","","","iiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipii","
["code"]=>
string(8) "HA035028"
[1]=>
string(8) "HA035028"
["description"]=>
string(35) "Hursley Road - Leigh House Hospital"
[2]=>
string(35) "Hursley Road - Leigh House Hospital"
["bay"]=>
string(0) ""
[3]=>
string(0) ""
["area"]=>
string(7) "HURSLEY"
[4]=>
string(7) "HURSLEY"
["lat"]=>
string(11) "52 58.3498N"
[5]=>
string(11) "52 58.3498N"
["lon"]=>
string(10) "1 26.5421W"
[6]=>
string(10) "1 26.5421W"
["build"]=>
string(0) ""
[7]=>
string(0) ""
["msgTime"]=>
string(0) ""
[8]=>
string(0) ""
["routes"]=>
string(2) "24"
[9]=>
string(2) "24"
["simNo"]=>
string(0) ""
[10]=>
string(0) ""
["displayType"]=>
string(0) ""
[11]=>
string(0) ""
["version"]=>
string(0) ""
[12]=>
string(0) ""
["comments"]=>
string(57) "iiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipii"
[13]=>
string(57) "iiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipii"
}
答案 2 :(得分:0)
[^"]
答案很好,但我认为您还可以将+
和*
运算符变为懒惰运算符,方法是+?
和{{1}分别。
*?
似乎其中一个表达式占据了过多的界限。我不完全确定为什么(但它会导致大量的回溯)。