PHP preg_match()未知错误

时间:2010-09-07 12:11:27

标签: php regex

我目前正在尝试使用preg_match()在PHP中处理csv文件。我想要处理的数据示例如下:

  

“SN120187”,“Aldersr Rd Nr Shops”,“”,“STHPTN”,“50 56.4241N”,“1 25.7587W”,“1001077307”,“2010-05-30 15:29:49”, “10”, “”, “SURRSHLT3x32”, “BSU243L1”, “iiipiiipiiipiiipiii”,

     

“HA035028”,“Hursley Road - Leigh House Hospital”,“”,“HURSLEY”,“50 59.6772N”,“1 23.4412W”,“”,“”,“24”,“”,“” ,“”,“快速的棕色狐狸跳过懒狗快速的棕色狐狸跳过懒狗”,

我有一个正则表达式,我试图在这个数据上使用(下面);

if(preg_match('/^"(?P<code>.+)","(?P<description>.+)","(?P<bay>.*)","(?P<area>.+)","(?P<lat>.+)","(?P<lon>.+)","(?P<build>.*)","(?P<msgTime>.*)","(?P<routes>.*)","(?P<simNo>.*)","(?P<displayType>.*)","(?P<version>.*)","(?P<comments>.*)",$/', $line, $matches)){}

正则表达式对95%的数据起作用,但是,不起作用的数据使csv行中的最后一个字段为非空。

我开始玩数据,(主要是最后一个字段),发现以下数据不会通过正则表达式;

  

“SN120187”,“Aldersr Rd Nr Shops”,“”,“STHPTN”,“50 54.5512N”,“1 22.9273W”,“1001077307”,“2010-05-30 15:29:49”, “10”, “”, “SURRSHLT3x32”, “BSU243L1”, “iiiipiiiipiiiipiiii”,

     

“HA035028”,“Hursley Road - Leigh House Hospital”,“”,“HURSLEY”,“52 58.3498N”,“1 26.5421W”,“”,“”,“24”,“”,“” , “”, “iiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipii”,

但是,如果从上面的数据中删除最后一个字段中的一个字符,它将通过。从玩弄它,我发现没有一致的模式来获得这个错误;字符串的总长度似乎并不重要(通过向其他字段添加额外字符来显示),并且最终字段的长度也无关紧要。

我不知道发生了什么事。有没有人有任何想法?

我目前正在运行PHP版本5.3.2,并且没有出现错误消息。

3 个答案:

答案 0 :(得分:2)

如果这是CSV数据,请使用str_getcsv等CSV处理函数作为字符串,或使用fgetcsv来读取文件。

答案 1 :(得分:0)

我在本地尝试了它,它和你描述的相同,我有PHP 5.2.10-2ubuntu6

首先尝试,我删除了您的模式的"(?P<comments>.*)",

$line='"HA035028","Hursley Road - Leigh House Hospital","","HURSLEY","52 58.3498N","1 26.5421W","","","24","","","","iiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipii",';

$r=preg_match('/^"(?P<code>.+)","(?P<description>.+)","(?P<bay>.*)","(?P<area>.+)","(?P<lat>.+)","(?P<lon>.+)","(?P<build>.*)","(?P<msgTime>.*)","(?P<routes>.*)","(?P<simNo>.*)","(?P<displayType>.*)","(?P<version>.*)",$/', $line, $matches);

var_dump($r, $matches);

输出:

int(1)
array(25) {
  [0]=>
  string(169) ""HA035028","Hursley Road - Leigh House Hospital","","HURSLEY","52 58.3498N","1 26.5421W","","","24","","","","iiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipii","
  ["code"]=>
  string(8) "HA035028"
  [1]=>
  string(8) "HA035028"
  ["description"]=>
  string(35) "Hursley Road - Leigh House Hospital"
  [2]=>
  string(35) "Hursley Road - Leigh House Hospital"
  ["bay"]=>
  string(0) ""
  [3]=>
  string(0) ""
  ["area"]=>
  string(7) "HURSLEY"
  [4]=>
  string(7) "HURSLEY"
  ["lat"]=>
  string(11) "52 58.3498N"
  [5]=>
  string(11) "52 58.3498N"
  ["lon"]=>
  string(13) "1 26.5421W",""
  [6]=>
  string(13) "1 26.5421W",""
  ["build"]=>
  string(0) ""
  [7]=>
  string(0) ""
  ["msgTime"]=>
  string(2) "24"
  [8]=>
  string(2) "24"
  ["routes"]=>
  string(0) ""
  [9]=>
  string(0) ""
  ["simNo"]=>
  string(0) ""
  [10]=>
  string(0) ""
  ["displayType"]=>
  string(0) ""
  [11]=>
  string(0) ""
  ["version"]=>
  string(57) "iiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipii"
  [12]=>
  string(57) "iiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipii"
}

请注意,<version>现在与最后一个字段匹配,而<lon>匹配两个字段


第二次尝试;我用.替换了每个[^"]次出现:

$line='"HA035028","Hursley Road - Leigh House Hospital","","HURSLEY","52 58.3498N","1 26.5421W","","","24","","","","iiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipii",';

$r=preg_match('/^"(?P<code>[^"]+)","(?P<description>[^"]+)","(?P<bay>[^"]*)","(?P<area>[^"]+)","(?P<lat>[^"]+)","(?P<lon>[^"]+)","(?P<build>[^"]*)","(?P<msgTime>[^"]*)","(?P<routes>[^"]*)","(?P<simNo>[^"]*)","(?P<displayType>[^"]*)","(?P<version>[^"]*)","(?P<comments>[^"]*)",$/', $line, $matches);

输出:

int(1)
array(27) {
  [0]=>
  string(169) ""HA035028","Hursley Road - Leigh House Hospital","","HURSLEY","52 58.3498N","1 26.5421W","","","24","","","","iiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipii","
  ["code"]=>
  string(8) "HA035028"
  [1]=>
  string(8) "HA035028"
  ["description"]=>
  string(35) "Hursley Road - Leigh House Hospital"
  [2]=>
  string(35) "Hursley Road - Leigh House Hospital"
  ["bay"]=>
  string(0) ""
  [3]=>
  string(0) ""
  ["area"]=>
  string(7) "HURSLEY"
  [4]=>
  string(7) "HURSLEY"
  ["lat"]=>
  string(11) "52 58.3498N"
  [5]=>
  string(11) "52 58.3498N"
  ["lon"]=>
  string(10) "1 26.5421W"
  [6]=>
  string(10) "1 26.5421W"
  ["build"]=>
  string(0) ""
  [7]=>
  string(0) ""
  ["msgTime"]=>
  string(0) ""
  [8]=>
  string(0) ""
  ["routes"]=>
  string(2) "24"
  [9]=>
  string(2) "24"
  ["simNo"]=>
  string(0) ""
  [10]=>
  string(0) ""
  ["displayType"]=>
  string(0) ""
  [11]=>
  string(0) ""
  ["version"]=>
  string(0) ""
  [12]=>
  string(0) ""
  ["comments"]=>
  string(57) "iiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipii"
  [13]=>
  string(57) "iiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipiiiipii"
}

答案 2 :(得分:0)

[^"]答案很好,但我认为您还可以将+*运算符变为懒惰运算符,方法是+?和{{1}分别。

*?
似乎其中一个表达式占据了过多的界限。我不完全确定为什么(但它会导致大量的回溯)。