在以相反顺序读取csv文件时读取前一行(Python)

时间:2017-03-24 01:48:30

标签: python csv

我需要从下往上读取的csv文件并将数据写入文本文件。该文件包含客户,产品和位置的不同组合的信息;但是,它没有所有必需的信息 - 数量为0 时缺少的行。该文件可能很大,这就是为什么我不需要重写它或使用其他列表,因为在某些时候我将它拆分。

我想要做的是在向后阅读文件时,将我列表中所需的 Period_ids 与csv文件中每个组合的所有ID进行比较,如果缺少id,我想要再次(再次)读取前一行,直到文件中的id等于列表中所需的id(ps。我知道我不能用for循环来做,但后来我不知道如何仍然读取文件以相反的顺序,做我需要做的事情)。请参阅附图,其中包含给定数据和所需结果(绿色表示每种组合的开头)。下面的方法(我在本例中缩短了)并不完全正确,因为我从csv文件中获取所有行但没有丢失的行。任何有关此逻辑的帮助都表示赞赏(我还希望以某种方式修改此现有方法而不使用像pandas这样的库:)谢谢!

def read_file_in_reverse():    #...一些代码

# Required ids.
all_required_ids = [412, 411, 410, 409, 408, 407, 406, 405]

# Needed to count period ids.
count_index_for_periodid = 0

# Read csv file.
with open(('.\myFile.csv'), 'rb') as f:       
    time_csv = csv.reader(f)

    # Read the file in reversed order.
    for line in reversed(list(time_csv)):
        # ... some code

            ###### Get quantities from the file.
            for col_num in range(5, 7):
                # ... code to get items

                ### quantity
                # If next id is not equal to the next required id.
                if str(next_id) != str(all_required_ids[count_index_for_periodid]):
                    list_qty.append(0) 
                else:
                    qty = line[col_num]
                    list_quantity.append(qty)

        # Should add another condition here      
        count_index_for_periodid += 1 

enter image description here

1 个答案:

答案 0 :(得分:0)

如果文件很大,那么最好不要一次将整个文件读入内存,如果需要向后读取文件,则需要这样做。相反,重新考虑问题以向前解析文件。实际上,您正在尝试编写包含所有必需Period_id的行块。因此,请继续读取行,直到找到ID为< =前一行的行。此时,您有一个块需要扩展以包含任何缺失的行,然后写入文件。例如:

import csv

def write_block(block):
    if len(block):
        fill = block[0][1:4]
        block_dict = {int(row[4]) : row for row in block}

        for row in range(405, 413):
            try:
                csv_output.writerow(block_dict[row])
            except KeyError as e:
                csv_output.writerow([999] + fill + [row, 0, 0, 0])

with open('myFile.csv', 'rb') as f_input, open('output.csv', 'wb') as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)
    header = next(csv_input)
    csv_output.writerow(header)
    block = [next(csv_input)]

    for row in csv_input:
        # Is the period ID <= to the last one that was read?
        if int(row[4]) <= int(block[-1][4]):
            write_block(block)
            # Start a new block
            block = [row]
        else:
            block.append(row)

    # Write any remaining entries when the end of file is reached
    write_block(block)

write_block()通过获取块的所有找到的条目并根据ID将它们转换为字典来工作。然后它会尝试在字典中查找每个必需的ID,如果它存在,它将按原样写入输出文件。如果缺少,则使用其他值创建合适的行。

如果你真的想要向后工作,那么只需读取整个文件(使用list(csv_input)),然后使用[::-1]向后迭代这些条目。然后需要更改逻辑以查找前一行读取的ID >=。 e.g。

import csv

def write_block(block):
    if len(block):
        fill = block[0][1:4]
        block_dict = {int(row[4]) : row for row in block}

        for row in range(405, 413):
            try:
                csv_output.writerow(block_dict[row])
            except KeyError as e:
                csv_output.writerow([999] + fill + [row, 0, 0, 0])

with open('myFile.csv', 'rb') as f_input, open('output.csv', 'wb') as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)
    header = next(csv_input)
    csv_output.writerow(header)
    block = [next(csv_input)]

    for row in list(csv_input)[::-1]:
        if int(row[4]) >= int(block[-1][4]):
            write_block(block)
            block = [row]
        else:
            block.append(row)

    write_block(block)

如果您在print row语句后添加for,则可以看到它正在向后运行。