Question

使用python创建CSV文件后，我试图打开它。我的目标是能够不编辑就读回文件，而我的问题是我无法使定界符起作用。我的文件是使用python csv writer创建的，然后尝试使用读取器从文件中读取数据。这就是我被困住的地方。我的CSV文件保存在与python程序保存位置相同的位置，因此我知道这不是访问问题。我的文件使用特殊字符定界符创建，我正在使用分号;，因为原始数据已经包含逗号,，冒号;，符号+，＆符{{1} }，句点&，并可能带有下划线.和/或破折号_。这是我用来读取CSV文件的代码：

现在这是我的csv文件（with open('Cool.csv') as csv_file: csv_reader = csv.reader(csv_file, delimiter=';', dialect=csv.excel_tab) for row in csv_reader: print row[0] csv_file.close()）：

Cool.csv

所以我希望在运行代码时输出如下：

"Sat, 20 Apr 2019 00:17:05 +0000;Need to go to store;Eggs & Milk are needed ;Store: Grocery;Full Name: Safeway;Email: safewayiscool@gmail.com;Safeway <safewayiscool@gmail.com>, ;"
"Tue, 5 Mar 2019 05:54:24 +0000;Need to buy ham;Green eggs and Ham are needed for dinner ;Username: Dr.Seuss;Full Name: Theodor Seuss Geisel;Email: greeneggs+ham@seuss.com;"

我要么得到某种空错误，要么会打印出整行。如何获得将数据分离为要定义的由Sat, 20 Apr 2019 00:17:05 +0000 Tue, 5 Mar 2019 05:54:24 +0000分隔的列的信息？

我不确定问题是我要使用分号还是其他原因。如果只是分号，我可以在必要时进行更改，但是输入的数据中已经包含了许多其他字符。

也请不要建议我只是从原始文件中读取它。这是一个海量文件，其中包含许多其他数据，我想在使用第二个程序执行之前对其进行修剪。

更新：这是构建文件的代码：

Answer 1

文件似乎是错误创建的。提供的样本数据显示了整行被双引号括起来，将其视为一个长的单列。以下是正确的代码，可以读写和以分号分隔的文件：

import csv

with open('Cool.csv','w',newline='',encoding='utf-8-sig') as csv_file:
    csv_writer = csv.writer(csv_file,delimiter=';')
    csv_writer.writerow(['data,data','data;data','data+-":_'])

with open('Cool.csv','r',newline='',encoding='utf-8-sig') as csv_file:
    csv_reader = csv.reader(csv_file,delimiter=';')
    for row in csv_reader:
        print(row)

输出（匹配写入的数据）：

['data,data', 'data;data', 'data+-":_']

Cool.csv：

data,data;"data;data";"data+-"":_"

注意：

utf-8-sig是与Excel最兼容的编码。在Excel中打开CSV时，您放入文件中的所有Unicode字符都可以正常工作，并且看起来正确。
newline=''。 csv模块根据使用的方言（默认'excel'）处理自己的换行符。
;分隔符。默认的,将起作用。请注意第二个条目如何具有分号，因此该字段已被引用。如果定界符是逗号，但第一个带有逗号的字段将被引用，并且仍然可以使用。
csv_writer.writerow采用包含列数据的序列。
csv_reader将每一行作为列数据的list返回。
.CSV中的列如果包含定界符，则用双引号括起来；如果数据中存在引号，则用双引号引起来，以转义它们。请注意，第三个字段带有双引号。

csv_writer.close()

csv_reader.close()和with。

Answer 2

RTFM。

来自help (csv)

    DIALECT REGISTRATION:

    Readers and writers support a dialect argument, which is a convenient
    handle on a group of settings.  When the dialect argument is a string,
    it identifies one of the dialects previously registered with the module.
    If it is a class or instance, the attributes of the argument are used as
    the settings for the reader or writer:

        class excel:
            delimiter = ','
            quotechar = '"'
            escapechar = None
            doublequote = True
            skipinitialspace = False
            lineterminator = '\r\n'
            quoting = QUOTE_MINIMAL

然后您使用dialect=csv.excel_tab。

您有效地覆盖了分隔符。只是不要使用方言选项。

边注：with为您处理文件句柄的关闭。阅读here

第二边注：CSV文件的整个行都用双引号括起来。摆脱它们，或者禁用引用。即

with open('b.txt') as csv_file:
  csv_reader = csv.reader(csv_file, delimiter=';', quoting=csv.QUOTE_NONE)
  for row in csv_reader:
    print (row[0])

如何用分号分隔CSV文件？

2 个答案: