Question

我写了一个代码，它接收了一些数据，最后我得到了一个类似于以下内容的csv文件：

1,Steak,Martins
2,Fish,Martins
2,Steak,Johnsons
4,Veggie,Smiths
3,Chicken,Johnsons
1,Veggie,Johnsons

其中第一列是数量，第二列是项目的类型（在这种情况下是用餐），第三列是标识符（在这种情况下是家族名称）。我需要以特定方式将此信息打印到文本文件中：

Martins
1 Steak
2 Fish
Johnsons
2 Steak
3 Chicken
1 Veggie
Smiths
4 Veggie

所以我想要的是姓氏，其后是该家族所订购的。我写了下面的代码来完成这个，但它似乎并不完全存在。

import csv
orders = "orders.txt"
messy_orders = "needs_sorting.csv"

    with open(messy_orders, 'rb') as orders_for_sorting, open(orders, 'a') as final_orders_file:
        comp = []
        reader_sorting = csv.reader(orders_for_sorting)
        for row in reader_sorting:
            test_bit = [row[2]]
            if test_bit not in comp:
                comp.append(test_bit)
                final_orders_file.write(row[2])
                for row in reader_sorting:
                    if [row[2]] == test_bit:
                        final_orders_file.write(row[0], row[1])

            else:
                print "already here"
                continue

我最终得到的是以下内容

Martins
2 Fish

另外，我从来没有看到它“已经在这里”打印，但我认为如果它正常工作我应该。我怀疑发生的是程序经过第二个for循环，然后退出程序而不继续第一个循环。不幸的是，一旦识别出并在一个文件中打印出给定系列名称的所有实例，我不确定如何让它回到原始循环。我认为我设置这种方式的原因是，我可以将姓氏写成标题。否则我只会按姓氏对文件进行排序。请注意，在通过我的第一个程序运行订单后，我确实设法对所有内容进行排序，以便每行代表该系列的那种食物的完整数量（没有包含Steak和Martins的一行的重复实例）。

Answer 1

这是一个用字典解决的问题;这将根据您文件的姓氏（姓氏）累积您的项目。

你要做的第二件事是累计每种类型的餐 - 记住你正在阅读的数据是一个字符串，而不是你可以添加的整数，所以你必须做一些转换。

要将所有这些放在一起，请尝试以下代码段：

import csv

d = dict()

with open(r'd:/file.csv') as f:
    reader = csv.reader(f)
    for row in reader:
        # if the family name doesn't
        # exist in our dictionary,
        # set it with a default value of a blank dictionary
        if row[2] not in d:
            d[row[2]] = dict()

        # If the meal type doesn't exist for this
        # family, set it up as a key in their dictionary
        # and set the value to int value of the count
        if row[1] not in d[row[2]]:
            d[row[2]][row[1]] = int(row[0])
        else:
            # Both the family and the meal already
            # exist in the dictionary, so just add the
            # count to the total
            d[row[2]][row[1]] += int(row[0])

完成该循环后，d看起来像这样：

{'Johnsons': {'Chicken': 3, 'Steak': 2, 'Veggie': 1},
 'Martins': {'Fish': 2, 'Steak': 1},
 'Smiths': {'Veggie': 4}}

现在只需将其打印出来：

for family,data in d.iteritems():
   print('{}'.format(family))
   for meal, total in data.iteritems():
       print('{} {}'.format(total, meal))

在循环结束时，您将拥有：

Johnsons
3 Chicken
2 Steak
1 Veggie
Smiths
4 Veggie
Martins
2 Fish
1 Steak

您稍后可以使用defaultdict

改进此代码段

Answer 2

第一次回复，所以这里是一个去。您是否考虑过跟踪订单然后写入文件？我尝试使用基于dict的方法，它似乎工作正常。我们的想法是按姓氏编制索引并存储包含订单数量和类型的对列表。

您可能还想考虑代码的可读性 - 很难跟踪和调试。但是，我认为正在发生的是

行

for line in reader_sorting:

通过reader_sorting迭代。您读取第一个名称，提取姓氏，然后再继续在reader_sorting中迭代。这次从第二行开始，系列名称匹配，然后成功打印。该行的其余部分不匹配，但您仍然遍历它们。现在你已经完成了对reader_sorting的迭代，并且循环结束，即使在外部循环中你只读过一行。

一种解决方案可能是在外部for循环中创建另一个迭代器，而不是花费循环遍历的迭代器。但是，您仍然需要处理重复计算或跟踪指数的可能性。另一种方法可能是在您迭代时按家庭保留订单。

import csv

orders = {}

with open('needs_sorting.csv') as file:
    needs_sorting = csv.reader(file)
    for amount, meal, family in needs_sorting:
        if family not in orders:
            orders[family] = []            
        orders[family].append((amount, meal))

with open('orders.txt', 'a') as file:
    for family in orders:
        file.write('%s\n' % family)
        for amount, meal in orders[family]:
            file.write('%s %s\n' % (amount, meal))

按csv文件中的特定行组织和打印信息

2 个答案: