Python从给定的字符串中过滤数据

时间:2019-05-13 12:02:17

标签: python python-3.x

我有以下数据:

data = """
item: apple
store name: USA_1
store id: 1000
total: 200

item: apple
store name: USA_2
store id: 1001
total: 230

item: apple
store name: USA_3
store id: 1002
total: 330

item: apple
store name: UK1
store id: 2000
total: 20

item: apple
store name: UK_2
store id: 1021
total: 230
"""

我必须获得如下所示的存储字典格式:

{' USA_1': ' 1000', ' USA_2': ' 1001', ' USA_3': ' 1002', ' UK1': ' 2000', ' UK_2': ' 1021'}

我写了下面的代码,这些代码将获得上面的输出:

STORE_NAME_GATHERED = []
STORE_IDS_GATHERED = []
STORE_info = {}
for line in data.split("\n"):
    line = line.strip()
    if line.startswith("store name:"):
        name = line.split(":")[1]
        if not name in  STORE_NAME_GATHERED:
            STORE_NAME_GATHERED.append(name)
    elif line.startswith("store id:"):
        id = line.split(":")[1]
        if not id in STORE_IDS_GATHERED:
            STORE_IDS_GATHERED.append(id)
            STORE_info[name] = id
print(STORE_info)

我从上面的代码中获得了预期的结果,但是,实现上述输出并获得可靠的结果并不是最好的代码,有人可以帮助我以正确的代码以可靠的方式实现相同的结果

1 个答案:

答案 0 :(得分:5)

使用regex

例如:

import re


data = """
item: apple
store name: USA_1
store id: 1000
total: 200

item: apple
store name: USA_2
store id: 1001
total: 230

item: apple
store name: USA_3
store id: 1002
total: 330

item: apple
store name: UK1
store id: 2000
total: 20

item: apple
store name: UK_2
store id: 1021
total: 230
"""

name = re.findall(r"store name: (.*)", data)   #Get Store Name
store = re.findall(r"store id: (.*)", data)    #Get Store ID

print(dict(zip(name, store)))

输出:

{'UK1': '2000',
 'UK_2': '1021',
 'USA_1': '1000',
 'USA_2': '1001',
 'USA_3': '1002'}