Question

我一直在与这个问题作斗争一段时间，并希望得到一些帮助。我有一个纯文本数据，如下所示：

(1-3) Apple  

(1) Pear (2) Apple (3) Cherry

(1) Banana (2) Apple

(1-2) Apple  

(1-4) Pear  
...

我正在循环中读取数据，并且我正在尝试找到一种优雅而有效的方法来将数据转换为“Baskets”（最多6种不同的水果）：

购物篮1

Fruit1=Apple , Fruit2=Apple, Fruit3=Apple

购物篮2

Fruit1=Pear , Fruit2=Apple, Fruit3=Cherry

购物篮3

Fruit1=Banana, Fruit2=Apple

问题是：在没有复杂的IF语句的情况下，有没有Pythonish方法呢？ 预期的结果将是一个词典列表。

Answer 1

假设格式是你所暗示的（一系列令牌，第一个是(N)或(N-M)，第二个是任何不包含空格的单词）......：

import re

list_of_dicts = []
with open('thefile.txt') as f:
    for line in f:
        d = dict()
        list_of_dicts.append(d)
        tokens = line.strip().split()
        for i in range(0, len(tokens), 2):
            where, what = tokens[i:i+2]
            mo = re.search(r'(\d+)-(\d+)', where)
            if mo:
                start = int(mo.group(1))
                end = int(mo.group(2)) + 1
            else:
                mo = re.search(r'(\d+)', where)
                if mo:
                    start = int(mo.group(1))
                    end = start + 1
                else:
                    msg = 'cannot parse {} in {}'.format(where, tokens)
                    raise ValueError(msg)
            for i in range(start, end):
                d['Fruit{}'.format(i)] = what

python循环通过具有不同可能性的非结构化数据

1 个答案: