Question

我正在使用python中的大型CSV文件，并且我尝试根据绑定到唯一标识符的文本列表创建字典。在CSV中，Items列的每个单元格中的值最初是自由文本，现在是逗号分隔列表。数据如下所示：

ID      Items
123     'A', 'B', 'C'
234     'A', 'C', 'D'
567     'A', 'D', 'E', 'F'

我试图计算Items列中每个元素的唯一标识符数（即有多少个唯一ID有A，有多少有B）。有没有办法用项目作为键创建字典？像这样：

{'A': 123, 234, 567 'B': 123 'C': 123, 234 'D': 234, 567}

我正在尝试使用for循环。首先，我确定了我想要使用的csv列 - 即。项目（10）。然后我想循环遍历列表中的每个元素。

dict = {}        
reader = csv.reader(inF)
for row in reader:
    items = row[10]
        for x in items:
            if x not in dict:
                  dict[x] += x

Answer 1

根据您提供的文件格式，这将有效。但是根据您拥有的边缘情况，您可能需要修改正则表达式。我没有使用csv阅读器，因为在这种情况下，正则表达式似乎很容易，不能。

# import regular expressions
import re

itemLookup = dict()
file = 'data.csv'
with open(file, 'r') as f:
    for line in f:
        # split rows on either ', ' or ' '
        columns = re.split(',? +',  line)

        # only process row if it starts with a number
        id_mo = re.search('^\d+$', columns[0])
        if id_mo:
            # get the id number (first column)
            # and convert it from a string to an integer
            id = int(id_mo.group(0))

            # for the rest of the columns in this row
            for col in columns[1:]:
                # search for the get item name  in the column
                # (without quotes or new lines)
                # i.e. i'm assuming item name matches this regex
                item_mo = re.search('\w+', col)

                # ignore empty columns
                if item_mo:
                    # get the item name that we just searched for
                    item = item_mo.group(0)

                    # if we have not come across this item name before
                    if item not in itemLookup:
                        # then create it, and assign it an empty list
                        itemLookup[item] = []
                    # add the id to the list referenced by the item name
                    itemLookup[item].append(id)

print(itemLookup)

输出：

{   'A': [123, 234, 567], 
    'C': [123, 234], 
    'B': [123], 
    'E': [567], 
    'D': [234, 567], 
    'F': [567]
}

使用列表项作为python字典中的键

1 个答案: