尝试使用python删除csv文件中的多余定界符时,文本修饰符放错了位置

时间:2018-07-05 07:32:58

标签: python csv

我正在尝试使用python脚本删除数据之间的多余定界符。我通常使用大型数据集。例如:

"abc","def","ghi","jkl","mno","pqr"
"","","fds","dfs","adfadf","AAAA111"
"","","fds","df,s","adfadf","AAAA111"

如果运行脚本,该脚本将删除第2行“ df,s”中多余的定界符:

"abc","def","ghi","jkl","mno","pqr"
"","","fds","dfs","adfadf","AAAA111"
"","","fds","dfs","adfadf","AAAA111"

我能够针对一种数据类型正确运行脚本,但是我注意到对于少数文本限定符数据,文本限定符放错了位置,结果如下所示:

"abc","def","ghi","jkl","mno","pqr"
"""","""""""""","""""fds""""","""""dfs""""","""""adfadf""""","AAAA111""""
"""","""""""""","""""fds""""","""""dfs""""","""""adfadf""""","AAAA111""""

脚本为:

#export the data
# with correct quoting, and that you are stuck with what you have.
import csv
from csv import DictWriter

with open("big-12.csv", newline='') as people_file:
    next(people_file)
    corrected_people = []
    for person_line in people_file:
        chomped_person_line = person_line.rstrip()
        person_tokens = chomped_person_line.split(",")

        # check that each field has the expected type
        try:
            corrected_person = {
"abc":person_tokens[0],
"def":person_tokens[1],
"ghi":person_tokens[2],
"jkl":"".join(person_tokens[3:-3]),
"mno":person_tokens[-2],
"pqr":person_tokens[-1]

            }

            if not corrected_person["DR_CR"].startswith(
                    "") and corrected_person["DR_CR"] !="n/a":
                raise ValueError

            corrected_people.append(corrected_person)
        except (IndexError, ValueError):
            # print the ignored lines, so manual correction can be performed later.
            print("Could not parse line: " + chomped_person_line)

    with open("corrected_people.txt", "w", newline='') as corrected_people_file:
        writer = DictWriter(
            corrected_people_file,
            fieldnames=[
                "abc", "def", "ghi", "jkl", "mno", "pqr"
          ],delimiter=',',quoting=csv.QUOTE_ALL)
        writer.writeheader()
        writer.writerows(corrected_people)

此脚本删除了中间的多余分隔符,但是我在使用文本限定符时遇到了麻烦。如果解决了文本限定词问题,那么它将大有帮助。 Python版本Python 3.6.0 :: Anaconda 4.3.1(64位)

1 个答案:

答案 0 :(得分:0)

writer = DictWriter(
    corrected_people_file,
    fieldnames=[
        "abc", "def", "ghi", "jkl", "mno", "pqr"
    ],delimiter=',',quoting=csv.QUOTE_ALL)

QUOTE_ALL将强制所有字段加引号,而现有的双引号将被另一个双引号转义。

因此,请尝试使用QUOTE_NONEQUOTE_MINIMAL,或在写之前去除引号的字段。

  

我在使用文本限定符时遇到了麻烦

此外,引号字段并不意味着它们是文本与数字,引号仅用于允许嵌入分隔符,并且也可以在数字字段周围。


通常,使用csv阅读器而不是使用split()更好,更安全。使用csv阅读器时,字段"df,s"将被正确引用,因为它被加了引号。然后,您可以从单个字段中删除,