Question

我需要从下往上读取的csv文件并将数据写入文本文件。该文件包含客户，产品和位置的不同组合的信息;但是，它没有所有必需的信息 - 数量为0 时缺少的行。该文件可能很大，这就是为什么我不需要重写它或使用其他列表，因为在某些时候我将它拆分。

我想要做的是在向后阅读文件时，将我列表中所需的 Period_ids 与csv文件中每个组合的所有ID进行比较，如果缺少id，我想要再次（再次）读取前一行，直到文件中的id等于列表中所需的id（ps。我知道我不能用for循环来做，但后来我不知道如何仍然读取文件以相反的顺序，做我需要做的事情）。请参阅附图，其中包含给定数据和所需结果（绿色表示每种组合的开头）。下面的方法（我在本例中缩短了）并不完全正确，因为我从csv文件中获取所有行但没有丢失的行。任何有关此逻辑的帮助都表示赞赏（我还希望以某种方式修改此现有方法而不使用像pandas这样的库:)谢谢！

def read_file_in_reverse（）：＃...一些代码

# Required ids.
all_required_ids = [412, 411, 410, 409, 408, 407, 406, 405]

# Needed to count period ids.
count_index_for_periodid = 0

# Read csv file.
with open(('.\myFile.csv'), 'rb') as f:       
    time_csv = csv.reader(f)

    # Read the file in reversed order.
    for line in reversed(list(time_csv)):
        # ... some code

            ###### Get quantities from the file.
            for col_num in range(5, 7):
                # ... code to get items

                ### quantity
                # If next id is not equal to the next required id.
                if str(next_id) != str(all_required_ids[count_index_for_periodid]):
                    list_qty.append(0) 
                else:
                    qty = line[col_num]
                    list_quantity.append(qty)

        # Should add another condition here      
        count_index_for_periodid += 1

Answer 1

如果文件很大，那么最好不要一次将整个文件读入内存，如果需要向后读取文件，则需要这样做。相反，重新考虑问题以向前解析文件。实际上，您正在尝试编写包含所有必需Period_id的行块。因此，请继续读取行，直到找到ID为＆lt; =前一行的行。此时，您有一个块需要扩展以包含任何缺失的行，然后写入文件。例如：

import csv

def write_block(block):
    if len(block):
        fill = block[0][1:4]
        block_dict = {int(row[4]) : row for row in block}

        for row in range(405, 413):
            try:
                csv_output.writerow(block_dict[row])
            except KeyError as e:
                csv_output.writerow([999] + fill + [row, 0, 0, 0])

with open('myFile.csv', 'rb') as f_input, open('output.csv', 'wb') as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)
    header = next(csv_input)
    csv_output.writerow(header)
    block = [next(csv_input)]

    for row in csv_input:
        # Is the period ID <= to the last one that was read?
        if int(row[4]) <= int(block[-1][4]):
            write_block(block)
            # Start a new block
            block = [row]
        else:
            block.append(row)

    # Write any remaining entries when the end of file is reached
    write_block(block)

write_block()通过获取块的所有找到的条目并根据ID将它们转换为字典来工作。然后它会尝试在字典中查找每个必需的ID，如果它存在，它将按原样写入输出文件。如果缺少，则使用其他值创建合适的行。

如果你真的想要向后工作，那么只需读取整个文件（使用list（csv_input）），然后使用[::-1]向后迭代这些条目。然后需要更改逻辑以查找前一行读取的ID >=。 e.g。

import csv

def write_block(block):
    if len(block):
        fill = block[0][1:4]
        block_dict = {int(row[4]) : row for row in block}

        for row in range(405, 413):
            try:
                csv_output.writerow(block_dict[row])
            except KeyError as e:
                csv_output.writerow([999] + fill + [row, 0, 0, 0])

with open('myFile.csv', 'rb') as f_input, open('output.csv', 'wb') as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)
    header = next(csv_input)
    csv_output.writerow(header)
    block = [next(csv_input)]

    for row in list(csv_input)[::-1]:
        if int(row[4]) >= int(block[-1][4]):
            write_block(block)
            block = [row]
        else:
            block.append(row)

    write_block(block)

如果您在print row语句后添加for，则可以看到它正在向后运行。

在以相反顺序读取csv文件时读取前一行（Python）

1 个答案: