Question

您可能是aware的ASCII delimited text，也可能不是使用非键盘字符分隔字段和行的好处。

写出来很容易：

import csv

with open('ascii_delim.adt', 'w') as f:
    writer = csv.writer(f, delimiter=chr(31), lineterminator=chr(30))
    writer.writerow(('Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue'))
    writer.writerow(('Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!'))

而且，当然，你可以把东西丢弃。但是，在阅读时，lineterminator什么都不做，如果我尝试做的话：

open('ascii_delim.adt', newline=chr(30))

抛出ValueError: illegal newline value:

那么如何读取我的ASCII分隔文件？我是否会降级为line.split(chr(30))？

Answer 1

您可以通过有效地将文件中的行尾字符转换为换行符来csv.reader进行硬编码以识别：

import csv

with open('ascii_delim.adt', 'w') as f:
    writer = csv.writer(f, delimiter=chr(31), lineterminator=chr(30))
    writer.writerow(('Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue'))
    writer.writerow(('Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!'))

def readlines(f, newline='\n'):
    while True:
        line = []
        while True:
            ch = f.read(1)
            if ch == '':  # end of file?
                return
            elif ch == newline:  # end of line?
                line.append('\n')
                break
            line.append(ch)
        yield ''.join(line)

with open('ascii_delim.adt', 'rb') as f:
    reader = csv.reader(readlines(f, newline=chr(30)), delimiter=chr(31))
    for row in reader:
        print row

输出：

['Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue']
['Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!']

Answer 2

The documentation说：

读者硬编码识别＆＃39; \ r＆＃39;或者＆＃39; \ n＆＃39;作为行尾，并忽略lineterminator。这种行为将来可能会改变。

因此csv模块无法读取使用自定义行终止符的CSV文件。

Answer 3

嘿，我整天都在努力解决类似的问题。我写了一个很大程度上受@martineau启发的功能，应该为你解决它。我的函数较慢，但可以解析由任何类型的字符串分隔的文件。希望它有所帮助！

import csv

def custom_CSV_reader(csv_file,row_delimiter,col_delimiter):

    with open(csv_file, 'rb') as f:

        row = [];
        result = [];
        temp_row = ''
        temp_col = ''
        line = ''
        go = 1;

        while go == 1:
            while go == 1:
                ch = f.read(1)

                if ch == '':  # end of file?
                    go = 0

                if ch != '\n' and ch != '\t' and ch != ',':
                    temp_row = temp_row + ch
                    temp_col = temp_col + ch
                    line = line + ch

                if row_delimiter in temp_row:
                    line = line[:-len(row_delimiter)]

                    row.append(line)

                    temp_row = ''
                    line= ''

                    break

                elif col_delimiter in temp_col:
                    line = line[:-len(col_delimiter)]
                    row.append(line)
                    result.append(row)

                    row = [];
                    temp_col = ''
                    line = ''
                    break
    return result

Answer 4

每the docs for open：

换行符控制通用换行模式的工作方式（仅适用于文本模式）。它可以是None，''，'\n'，'\r'和'\r\n'。

所以open将无法处理您的文件。每the csv docs：

注意 reader是硬编码的，可以将'\r'或'\n'识别为行尾，并忽略 lineterminator

所以也不会这样做。我还研究了str.splitlines是否可配置，但它使用了一组定义的边界。

我是否已经降级为line.split(chr(30))？

看起来那样，对不起！

使用csv模块读取ascii分隔文本？

4 个答案: