我有一个csv文件。每个值都用"""
引号引起来。我想将其删除以进行进一步处理
这是我的csv文件
Name,age,class,place
""""ishika""","""21""","""B"""","""Whitefield"""
"""anju""","""23""","""C""","""ITPL"""
我希望输出为:
Name,age,class,place
ishika,21,B,Whitefield
anju,23,C,ITPL
我正在从postgres表获取csv。
import psycopg2
import config as cfg
conn = cfg.DATABASE_CONNECT
cur = conn.cursor()
import csv
import pandas as pd
import numpy as np
tablename = "sf_paymentprofile_error_log"
query = "SELECT * from {} ".format(tablename)
outputquery = "COPY ({0}) TO STDOUT WITH CSV HEADER".format(query)
with open(cfg.PG_EXTRACT_PATH+'sf_paymentprofile_error_log.csv', 'w') as f:
cur.copy_expert(outputquery, data)
conn.commit()
conn.close()
我想要使用python的上述输出。
答案 0 :(得分:0)
使用熊猫的方法
import pandas as pd
df = pd.read_csv("your_file.csv")
for i in df.columns :
df[i] = df[i].apply(lambda x: str(x).replace('"',''))
df.to_csv("output.csv",index=False)
如果是列表:
output = []
for row in your_data :
b = []
for val in row :
b.append(val.replace('"',''))
c.append(b)
print(output)
答案 1 :(得分:0)
通过将它们视为引号来删除它们,但是csv
仅接受一个字符分隔符,因此:
import re
with open('data.csv') as f:
# replace """ to single "
data = (re.sub(r'"+', '"', line) for line in f.readlines())
# now treat it as normal csv
rd = csv.reader(data, delimiter=',', quotechar='"')
# print
for row in rd:
print(','.join(row))
或者,如果您认为安全,请对整个文件进行re.sub('"', '', f.read())
。
答案 2 :(得分:0)
pd.str.replace
和pd.str.strip
都会有所帮助,例如:
df.apply(lambda x: x.str.strip('"'))
无论如何,您的csv的某些行具有"
的继承,这些继承隐藏一些,
的分隔符,因此,如果我应用strip函数:
import pandas as pd
df = pd.read_csv("my.csv")
df = df.apply(lambda x: x.str.strip('"'))
print(df)
Name age class place
0 ishika 21 B"","Whitefield NaN
1 anju 23 C ITPL
我发现的第一个解决方法是更改quotechar
参数:
import pandas as pd
df = pd.read_csv("my.csv", quotechar="'")
df = df.apply(lambda x: x.str.strip('"'))
print(df)
Name age class place
0 ishika 21 B Whitefield
1 anju 23 C ITPL