Question

我正在开发一个读取冒号分隔文件的项目，合并到一个文件中删除所有具有符合特定条件的字段的记录。输入文件在我需要丢弃的每个文件的顶部有4行。将最终文件输出为冒号分隔文件，不包含不需要的记录。

这是代码

#!/usr/bin/python
import csv
import glob
import os
import sys
import datetime
import itertools
from itertools import islice


input_path = "c:\\data\\customer files\\project roo\\printer accounting\\data files\\"
output_path = "c:\\data\\customer files\\project roo\\printer accounting\\data files\\output\\"


input_files = os.path.join(input_path, '*.dat')
output_file = os.path.join(output_path,'{:%Y%m%d}-    summary.csv'.format(datetime.datetime.now()))

filewriter = csv.writer(open (output_file, 'w', newline= ''), delimiter= ':')

look_for = set(['Document Name = Microsoft Word - T.DOC'])

for input_file in glob.glob(input_files):
    with open(input_file) as csvfile:
        filereader = csv.reader(csvfile, delimiter= ':')
        for line in itertools.islice(csvfile,4,None):

            for row in filereader:
                #if row[3] in look_for:
                    #filewriter.writerow(none)
                #else:
                    #filewriter.writerow(row)
                print(row[0])

输入文件

Ignore 1
Ignore 2
Ignore 3
Ignore 4
Document Id= 123456 :Container ID=123123 :record status = complete : Document Name = T.DOC : Sender name = george:
Document Id= 789101 :Container ID=123123 :record status = complete : Document Name = form25 : Sender name = george:
Document Id= 121314 :Container ID=123123 :record status = complete : Document Name = ian.doc : Sender name = george:

输出文件应为

 Document ID= 121314 : Container ID=123123: record status = complete : Document Name= ian.doc : Sender Name = george

我已经把print命令只是为了看看我是否可以打印输入文件中的字段但是我得到索引超出范围错误。这告诉我输入文件列没有被索引我知道关于这个主题有很多问题，但我似乎无法确定解决方案。任何帮助将不胜感激

Answer 1

您的for row in filereader循环需要filereader，但它位于for line in itertools.islice(csvfile,4,None):内，您实际上没有使用它。

这就是为什么它从filereader打印相同的数据。

    filereader = csv.reader(csvfile, delimiter= ':')
    for line in itertools.islice(csvfile,4,None):

        for row in filereader:
            #if row[3] in look_for:
                #filewriter.writerow(none)
            #else:
                #filewriter.writerow(row)
            print(row[0])

Answer 2

我正在开发一个读取冒号分隔文件的项目

[...]输入文件在我需要丢弃的每个文件的顶部有4行。

在您的示例代码中，您以错误的方式解决了问题。你应该链接你的迭代器：

import csv
import itertools

input_file = "test.txt"

with open(input_file) as csvfile:
    src = itertools.islice(csvfile, 4, None)
    for row in csv.reader(src, delimiter= ':'):
        print(row[0])

首先，打开文件。这将返回“line iterator”
然后，使用itertools.islice从该迭代器中丢弃您需要的任何内容。这将返回另一个迭代器，其中只包含您要保留的行。
最后，使用稍后的迭代器作为CSV解析器的数据源（在“解析的CSV行”上返回第三个迭代器）。

鉴于输入文件：

Ignore 1
Ignore 2
Ignore 3
Ignore 4
Document Id= 123456 :Container ID=123123 :record status = complete : Document Name = T.DOC : Sender name = george:
Document Id= 789101 :Container ID=123123 :record status = complete : Document Name = form25 : Sender name = george:
Document Id= 121314 :Container ID=123123 :record status = complete : Document Name = ian.doc : Sender name = george:

上述程序将产生：

sh$ python r.py
Document Id= 123456 
Document Id= 789101 
Document Id= 121314

读取CSV文件时跳过标题

2 个答案: