正则表达式仅提取一个单词

时间:2019-04-16 20:14:56

标签: python regex

我有一个像这样的复杂文件:

  

“ start_nm”:“ BOSTON”,“ bus_num”:“ 1”,“ bus_num”:“ 2”,“ dest_nm”:“ NEW YorK”

我想得到Boston,1,2,newyork。 问题是:有些城市有1,2,3,5,有些城市有1,2。 1)而不是没有多个if语句,如何在一个简单的语句中获取它们? 2)由于bus_num的数量是动态的,我如何保持循环?

match1 = re.search('start_nm\":\"([^"]*)', line)
    if match1:
        print ("The start is  :"+match1.group(1))
match2= re.search('bus_num\":\"(\d+)', line)
    if match1:
        print ("The start is  :"+match1.group(1))

我能够提取它们,但是正在寻找一种简单的方法来: 1)除了这个大的条件以外,任何包的所有语句。 2)当bus_num的长度是动态的时,如何进行循环?

3 个答案:

答案 0 :(得分:0)

快速而肮脏的方法是查找std::regex_iterator和下一个:"之间的所有内容,例如:

"

答案 1 :(得分:0)

This format looks extremely similar to JSON. One solution:

>>> line = '"start_nm":"BOSTON","bus_num":"1", "bus_num":"2","dest_nm":"NEW YorK"'
>>> json.loads(f"{{ {line} }}").values()
dict_values(['BOSTON', '2', 'NEW YorK'])

Note the duplicate key "bus_num" which is preventing this solution from fully working

Another solutions:

>>> line = '"start_nm":"BOSTON","bus_num":"1", "bus_num":"2","dest_nm":"NEW YorK"'
>>> [v.split(",")[0][1:-1] for v in line.split(":")[1:]]
['BOSTON', '1', '2', 'NEW YorK']

答案 2 :(得分:0)

Here's a solution that creates a dictionary from your string
(intentionally avoided using comprehensions, etc. to keep it simple):

line = '"start_nm":"BOSTON","bus_num":"1", "bus_num":"2","dest_nm":"NEW YorK"'
line = (line.replace('\"','')).split(',')
d = {}
for l in line:
    k = l.strip().split(':')[0]
    v = l.strip().split(':')[1]
    if k in d:
        d[k] += ' ' + v
    else:
        d[k] = v

print(d)
print(d['start_nm'], '\t', d['bus_num'], '\t', d['dest_nm'])  

## {'start_nm': 'BOSTON', 'bus_num': '1 2', 'dest_nm': 'NEW YorK'}
## BOSTON    1 2     NEW YorK