通过定界符清理CSV

时间:2019-04-15 15:14:05

标签: python pandas csv data-cleaning

我有一个csv文件,其中的列全部排成一行,用引号引起来并用逗号分隔。列在一行中。

csv中的行用逗号分隔,如果有2个逗号,则表示缺少值。我想通过这些参数来分隔这些列。如果该行带有引号,则引号中的逗号不应是分隔符,因为这是一个地址。

这是数据示例(csv,我将其转换为字典以显示示例)

{'Store code,"Biz","Add","Labels","TotalSe","DirectSe","DSe","TotalVe","SeVe","MaVe","Totalac","Webact","Dions","Ps"': {0: ',,,,"Numsearching","Numsearchingbusiness","Numcatprod","Numview","Numviewed","Numviewed2","Numaction","Numwebsite","Numreques","Numcall"',
  1: 'Nora,"Ora","Sgo, Mp, 2000",,111,44,33,121,1232,53411,4,5,,3',
  2: 'mc11,"21 old","tjis that place, somewher, Netherlands, 2434",,3245,325,52454,3432,243,4353,343,23,23,18'}}

到目前为止,我已经尝试过了,但是有点卡住了:

disc = pd.read_csv('/content/gdrive/My Drive/blank/blank.csv',delimiter='",')

csv示例: csv sample

1 个答案:

答案 0 :(得分:1)

我使用普通函数在两端的每一行中删除",然后将两个""转换为单个"

通过这种方式,我可以获取可以用read_csv()加载的CSV

f1 = open('Sample - Sheet1.csv')
f2 = open('temp.csv', 'w')
for row in f1:
    row = row.strip() # remove "\n"
    row = row[1:-1] # remove " on both ends
    row = row.replace('""', '"') # conver "" into "
    f2.write(row + '\n')
f2.close()
f1.close()

df = pd.read_csv('temp.csv')

print(len(df.columns))
print(df)

另一种方法:将其读取为CSV并另存为普通字符串

import csv

f1 = open('Sample - Sheet1.csv')
f2 = open('temp.csv', 'w')

reader = csv.reader(f1)
for row in reader:
    f2.write(row[0] + '\n')

f2.close()
f1.close()


df = pd.read_csv('temp.csv')

print(len(df.columns))
print(df)