Python:如何使用控制字符定界符导入类似csv的d​​at文件

时间:2019-03-08 12:31:23

标签: python python-import delimited-text control-characters

我有一个数据文件,该文件具有DC4控制字符作为分隔符。这是我现在拥有的代码(是我从别人那里复制的,不是我的代码)。

import csv
with open('Test.dat') as csv_file:
    csv_reader = csv.reader(csv_file, quotechar='þ', delimiter='')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        else:
            print(f'\t{row[0]} works in the {row[1]} department, and was born in {row[2]}.')
            line_count += 1
    print(f'Processed {line_count} lines.')

如您所见,该字符由一个框显示,到目前为止,只有notepad ++可以读取它。我发现了curses.ascii.isctrl(c),它似乎能够通过python读取该字符,然后将其作为插入符号读取? (https://docs.python.org/3.2/library/curses.ascii.html

我是编码的新手,不确定如何实现此功能,或者不确定它是否对我有用。以下是我尝试以文本和屏幕截图读取的dat文件的示例。

þIdentifierþþColumn 2þþColumn 3þ
þXX_0012345þþRandom Data 1þþRandom Data 1þ
þXX_0012346þþRandom Data 6þþRandom Data 2þ
þXX_0012347þþRandom Data 1þþRandom Data 3þ
þXX_0012348þþRandom Data 8þþRandom Data 4þ
þXX_0012349þþRandom Data 1þþRandom Data 5þ
þXX_0012345þþRandom Data 9þþRandom Data 1þ

Text File to see the DC4 control character

这是在python 3.6.1上使用此代码时的输出。除了¾字符(这就是读取DC4字符的方式)之外,其他所有内容看起来都不错。

Column names are þIdentifierþ, þColumn 2þ, þColumn 3þ
    þXX_0012345þ works in the þRandom Data 1þ department, and was born in þRandom Data 1þ.
    þXX_0012346þ works in the þRandom Data 6þ department, and was born in þRandom Data 2þ.
    þXX_0012347þ works in the þRandom Data 1þ department, and was born in þRandom Data 3þ.
    þXX_0012348þ works in the þRandom Data 8þ department, and was born in þRandom Data 4þ.
    þXX_0012349þ works in the þRandom Data 1þ department, and was born in þRandom Data 5þ.
    þXX_0012345þ works in the þRandom Data 9þ department, and was born in þRandom Data 1þ.
Processed 7 lines.

在此方面的任何帮助将不胜感激。谢谢!

1 个答案:

答案 0 :(得分:0)

您可以为此使用转义字符。 DC4是Ascii 20(0x14)

SDL_Init