我有一系列来自美国运通卡的交易描述字符串,我将使用PHP的preg_split()
来解析;
[
"THE DISNEY STORE #90DANBURY CT",
"CHRISTMAS TREE SHOPSDANBURY CT",
"BATH & BODY WORKS 07DANBURY CT",
"CITGO DODGINGTOWN GANEWTOWN CT",
"DUNKIN #344944 Q35 3MONROE CT",
"DUNKIN #344944 Q35 3MONROE CT",
"DUNKIN #344944 Q35 3MONROE CT",
"DUNKIN #344944 Q35 3MONROE CT",
"AT&T RECURR BILL PAYDALLAS TX",
"SHELL OIL 5754389960NEWTOWN CT",
"POSTAGE REFILL STAMFORD CT",
"SHELL OIL 5754389960NEWTOWN CT",
"ONLINE PAYMENT - THANK YOU",
"SHELL OIL 5754389960NEWTOWN CT",
"AOL SERVICE 800-827-6364 VA",
"SHELL OIL 5754389960NEWTOWN CT",
"EBAY INC. 0000 866-779-3229 CA",
"WWW.ITUNES.COM/BILL CUPERTINO CA",
"THE HOME DEPOT TRUMBULL CT",
"THE HOME DEPOT TRUMBULL CT",
"AMEX GIFT CARDS 866-268-0582 NY",
"APPLE ONLINE STORE CUPERTINO CA",
"APPLE ONLINE STORE CUPERTINO CA",
"AMAZON MKTPLACE PMTSAMZN.COM/BILL WA",
"THE HOME DEPOT BRIDGEPORT CT",
"AT&T RECURR BILL PAYDALLAS TX",
"SHELL OIL 5754389960NEWTOWN CT",
"AT&T RECURR BILL PAYDALLAS TX",
"SHELL OIL 5754389960NEWTOWN CT",
"WALGREENS NEWTOWN CT",
"THE HOME DEPOT TRUMBULL CT",
"ONLINE PAYMENT - THANK YOU",
"AOL SERVICE 800-827-6364 VA"
]
我尝试做的是从描述字符串解析供应商城市和州。此数据采用CSV格式上传到PHP脚本。使用在线工具regexr.com我已经能够使用这个表达式接近:
([A-Z&0-9 ./#\*\-]{0,19})\w
我能够推断出最多20个字符的描述,除了支付的情况,然后文本运行。城市以20个字符的限制开始,在某些情况下允许包含空格。状态为2个字符,前面有空格。
CHRISTMAS TREE SHOPSDANBURY CT
Parse to
供应商: CHRISTMAS TREE SHOPS
城市: DANBURY
州: CT
付款;
ONLINE PAYMENT - THANK YOU
保持不变。
边缘情况;
AOL SERVICE 800-827-6364 VA
会解析
供应商: AOL SERVICE
明细: 800-827-6364
州: VA
(标签用于清晰度)
如果您查看我保存的结果https://regexr.com/3j39m,您会发现ONLINE PAYMENT - THANK YOU
和AOL SERVICE 800-827-6364 VA
等行未按预期进行解析。
答案 0 :(得分:1)
您可以使用正则表达式来分割固定宽度的字符串,如下所示:
<?php
$re = '/(?<Store>.{20})(?<City>.{20})(?<State>.{2})/m';
$str = 'THE DISNEY STORE #90DANBURY CT';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
foreach( $matches as $match ){
echo $match[1]."\t=>\t".$match[2]."\t=>\t".$match[3]."\n";
}
您可以使用substr()
实现相同目标。
答案 1 :(得分:-1)
在我看来,好像列表是制表符分隔的,所以这应该可以解决问题:/\t([A-Za-z ]+)\t+[A-Za-z]{2}$/
解释
\t
匹配标签([A-Za-z ]+)
匹配一串字符和空格,表示城镇名称\t+
匹配一个或多个标签(看起来可能在您的数据集中不止一个)[A-Za-z]{2}
匹配2个字符,表示州名缩写$
字符串结尾