我需要解析一个csv文件以获取每一行的一些信息(公司代码,公司描述,国家/地区),我在PHP中使用preg_match来解析该文件,但是在某些行上遇到了麻烦。
在csv文件的某些行下方
"ASTA","Aerospace Technologies of Australia Pty Ltd (Australia)"
"ATAC"," American Tactical Aircraft Consultants (United States)"
"ATEC"," ATEC vos (Czech Republic)"
"ATG","Aviation Technology Group Inc (United States)"
"ATLAS","Atlas Aircraft Corporation of South Africa (Pty) Ltd (South Africa)"
"ATR","GIE Avions de Transport Régional (France/Italy)"
"AUSTER","Auster Aircraft Ltd (United Kingdom)"
"AUSTFLIGHT","Austflight ULA Pty Ltd (Australia)"
"AUSTRALIAN AEROSPACE","Australian Aerospace Pty Ltd (Australia)"
"AUSTRALITE","Australite Inc (United States)"
"AUTOGYRO","AutoGyro Europe GmbH (Germany)"
"AVANTAGE","OOO Samoletstroitelynyi Kompaniya Avantazh (Russia)"
"AVCRAFT","AvCraft Aviation LLC (United States)"
"AVEKO","Aveko sro (Czech Republic)"
"AVIA (1)","Azionari Vercellese Industrie Aeronautiche (Italy)"
"AVIA (2)","Avia-Zavody Jirího Dimitrova (Czech Republic)"
PHP preg_match代码如下
preg_match('#^(.+?)\s\((.+?)\)$#',$string,$matches);
该代码可以很好地处理以下行:
"ASSO AEREI","Asso Aerei Srl (Italy)"
在上面的示例中,我成功地将这三个数据放入了matchs数组中……但有以下一行
"ATLAS","Atlas Aircraft Corporation of South Africa (Pty) Ltd (South Africa)"
我得到了,作为公司说明:
Atlas Aircraft Corporation of South Africa
并作为国家/地区:
Pty) Ltd (South Africa
应改为:
Atlas Aircraft Corporation of South Africa (Pty) Ltd
和
South Africa
让我抓狂的另一个问题是:当这些行不包含国家时,就像下面的行一样
"AERFER-AERMACCHI","see AERFER and AERMACCHI"
我得到一个空的公司描述数组。
对修复正则表达式模式有帮助吗? 非常感谢您的帮助
答案 0 :(得分:3)
$csv = <<<'EOD'
"ASTA","Aerospace Technologies of Australia Pty Ltd (Australia)"
"ATAC"," American Tactical Aircraft Consultants (United States)"
"ATEC"," ATEC vos (Czech Republic)"
"ATG","Aviation Technology Group Inc (United States)"
"ATLAS","Atlas Aircraft Corporation of South Africa (Pty) Ltd (South Africa)"
"ATR","GIE Avions de Transport Régional (France/Italy)"
"AUSTER","Auster Aircraft Ltd (United Kingdom)"
"AUSTFLIGHT","Austflight ULA Pty Ltd (Australia)"
"AUSTRALIAN AEROSPACE","Australian Aerospace Pty Ltd (Australia)"
"AUSTRALITE","Australite Inc (United States)"
"AUTOGYRO","AutoGyro Europe GmbH (Germany)"
"AVANTAGE","OOO Samoletstroitelynyi Kompaniya Avantazh (Russia)"
"AVCRAFT","AvCraft Aviation LLC (United States)"
"AVEKO","Aveko sro (Czech Republic)"
"AVIA (1)","Azionari Vercellese Industrie Aeronautiche (Italy)"
"AVIA (2)","Avia-Zavody Jirího Dimitrova (Czech Republic)"
"AERFER-AERMACCHI","see AERFER and AERMACCHI"
EOD;
$url = 'data:text/plain,' . urlencode($csv);
if ( false !== $handle = fopen($url, "r") ) {
while ( false !== $data = fgetcsv($handle) ) {
if ( preg_match('~(\S.*?)(?|\h*\(([^)]*)\)|())\h*$~', $data[1], $m) )
printf("%-70s\t%s\n", $m[1], $m[2]);
}
}
模式说明:
您的问题中有两点很重要:
这就是为什么我在描述部分(\S.*?)
中使用非贪婪量词。这样,即使存在国家/地区名称,也必须将描述子模式停止在左括号处(仅当该条件位于字符串的末尾时)。
开头的\S
仅在此处修饰左侧的描述部分。这就是为什么模式不带有^
锚点的原因。 \h*
之一将在右边进行修剪(由于非贪婪的量词也是如此)。
关于国家/地区部分:我选择使用branch reset group (?:\h*\(([^)]*)\))?
来确保捕获组2存在,而不是使用诸如(?|... (...) ... | ... (...) ...)
这样的可选非捕获组,即使国家不存在。在这种组中,捕获组在每个分支中的编号相同:
(?|
\h* \( ([^)]*) \) # the country name is present and captured in group 2
| # OR
() # the capture group 2 contains an empty string
)
答案 1 :(得分:1)
最好使用fgetcsv()函数代替preg_match。
$file = fopen("contacts.csv","r");
print_r(fgetcsv($file));
fclose($file);
您可以在fgetcsv()
中找到此功能的参考答案 2 :(得分:1)
此正则表达式捕获了所有选项:
"/^(.*?)(\(([^(]*?)\))?$/"
我尝试了以下代码:
$matches=array();
$re = "/^(.*?)(\(([^(]*?)\))?$/";
preg_match($re, $string, $matches);
foreach( $matches as $match ){
echo $match."\n";
}
当运行时:
$string = "Atlas Aircraft Corporation of South Africa (Pty) Ltd (South Africa)";
输出为:
Atlas Aircraft Corporation of South Africa (Pty) Ltd (South Africa)
Atlas Aircraft Corporation of South Africa (Pty) Ltd
(South Africa)
South Africa
与
一起运行时$string = "see AERFER and AERMACCHI"
输出为:
see AERFER and AERMACCHI
see AERFER and AERMACCHI
因此,您在$matches[1]
中获得公司描述,在$matches[3]
中获得国家/地区
答案 3 :(得分:0)
我的猜测是该表达式可能有效:
(.*)\s*\((.*?)\)|(.*)
使用
通过()
收集我们所需的数据
(.*)\s*\((.*?)\)
和其他没有使用的
(.*)
$re = '/(.*)\s*\((.*?)\)|(.*)/m';
$str = 'Aerospace Technologies of Australia Pty Ltd (Australia)
American Tactical Aircraft Consultants (United States)
ATEC vos (Czech Republic)
Aviation Technology Group Inc (United States)
Atlas Aircraft Corporation of South Africa (Pty) Ltd (South Africa)
GIE Avions de Transport Régional (France/Italy)
Auster Aircraft Ltd (United Kingdom)
Austflight ULA Pty Ltd (Australia)
Australian Aerospace Pty Ltd (Australia)
Australite Inc (United States)
AutoGyro Europe GmbH (Germany)
OOO Samoletstroitelynyi Kompaniya Avantazh (Russia)
AvCraft Aviation LLC (United States)
Aveko sro (Czech Republic)
Azionari Vercellese Industrie Aeronautiche (Italy)
Avia-Zavody Jirího Dimitrova (Czech Republic)
see AERFER and AERMACCHI';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);