使用csv模块读取ascii分隔文本?

时间:2015-05-13 20:13:27

标签: python csv newline

您可能是awareASCII delimited text,也可能不是使用非键盘字符分隔字段和行的好处。

写出来很容易:

import csv

with open('ascii_delim.adt', 'w') as f:
    writer = csv.writer(f, delimiter=chr(31), lineterminator=chr(30))
    writer.writerow(('Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue'))
    writer.writerow(('Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!'))

而且,当然,你可以把东西丢弃。但是,在阅读时,lineterminator什么都不做,如果我尝试做的话:

open('ascii_delim.adt', newline=chr(30))

抛出ValueError: illegal newline value:

那么如何读取我的ASCII分隔文件?我是否会降级为line.split(chr(30))

4 个答案:

答案 0 :(得分:4)

您可以通过有效地将文件中的行尾字符转换为换行符来csv.reader进行硬编码以识别:

import csv

with open('ascii_delim.adt', 'w') as f:
    writer = csv.writer(f, delimiter=chr(31), lineterminator=chr(30))
    writer.writerow(('Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue'))
    writer.writerow(('Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!'))

def readlines(f, newline='\n'):
    while True:
        line = []
        while True:
            ch = f.read(1)
            if ch == '':  # end of file?
                return
            elif ch == newline:  # end of line?
                line.append('\n')
                break
            line.append(ch)
        yield ''.join(line)

with open('ascii_delim.adt', 'rb') as f:
    reader = csv.reader(readlines(f, newline=chr(30)), delimiter=chr(31))
    for row in reader:
        print row

输出:

['Sir Lancelot of Camelot', 'To seek the Holy Grail', 'blue']
['Sir Galahad of Camelot', 'I seek the Grail', 'blue... no yellow!']

答案 1 :(得分:2)

The documentation说:

  

读者硬编码识别' \ r'或者' \ n'作为行尾,并忽略lineterminator。这种行为将来可能会改变。

因此csv模块无法读取使用自定义行终止符的CSV文件。

答案 2 :(得分:0)

嘿,我整天都在努力解决类似的问题。我写了一个很大程度上受@martineau启发的功能,应该为你解决它。我的函数较慢,但可以解析由任何类型的字符串分隔的文件。希望它有所帮助!

import csv

def custom_CSV_reader(csv_file,row_delimiter,col_delimiter):

    with open(csv_file, 'rb') as f:

        row = [];
        result = [];
        temp_row = ''
        temp_col = ''
        line = ''
        go = 1;

        while go == 1:
            while go == 1:
                ch = f.read(1)

                if ch == '':  # end of file?
                    go = 0

                if ch != '\n' and ch != '\t' and ch != ',':
                    temp_row = temp_row + ch
                    temp_col = temp_col + ch
                    line = line + ch

                if row_delimiter in temp_row:
                    line = line[:-len(row_delimiter)]

                    row.append(line)

                    temp_row = ''
                    line= ''

                    break

                elif col_delimiter in temp_col:
                    line = line[:-len(col_delimiter)]
                    row.append(line)
                    result.append(row)

                    row = [];
                    temp_col = ''
                    line = ''
                    break
    return result

答案 3 :(得分:-1)

the docs for open

  

换行符控制通用换行模式的工作方式(仅适用于文本模式)。它可以是None'''\n''\r''\r\n'

所以open将无法处理您的文件。每the csv docs

  

注意 reader是硬编码的,可以将'\r''\n'识别为行尾,并忽略 lineterminator

所以也不会这样做。我还研究了str.splitlines是否可配置,但它使用了一组定义的边界。

  

我是否已经降级为line.split(chr(30))

看起来那样,对不起!