我的CSV看起来像这样:
F02303521,"Smith,Andy",GHI,"Smith,Andy",GHI,,,
F04300621,"Parker,Helen",CERT,"Yu,Betty",IOUS,,,
我想删除第2列等于第4列的所有行(例如Smith,Andy = Smith,Andy
时)。我试图在python中使用"
作为分隔符并将列拆分为:
F02303521,
Smith,Andy
,GHI,
Smith,Andy
,GHI,,,
我试过这个python代码:
testCSV = 'test.csv'
deletionText = 'linestodelete.txt'
correct = 'correctone.csv'
i = 0
j = 0 #where i & j keep track of line number
with open(deletionText,'w') as outfile:
with open(testCSV, 'r') as csv:
for line in csv:
i = i + 1 #on the first line, i will equal 1.
PI = line.split('"')[1]
investigator = line.split('"')[3]
#if they equal each other, write that line number into the text file
as to be deleted.
if PI == investigator:
outfile.write(i)
#From the TXT, create a list of line numbers you do not want to include in output
with open(deletionText, 'r') as txt:
lines_to_be_removed_list = []
# for each line number in the TXT
# remove the return character at the end of line
# and add the line number to list domains-to-be-removed list
for lineNum in txt:
lineNum = lineNum.rstrip()
lines_to_be_removed_list.append(lineNum)
with open(correct, 'w') as outfile:
with open(deletionText, 'r') as csv:
# for each line in csv
# extract the line number
for line in csv:
j = j + 1 # so for the first line, the line number will be 1
# if csv line number is not in lines-to-be-removed list,
# then write that to outfile
if (j not in lines_to_be_removed_list):
outfile.write(line)
但对于这一行:
PI = line.split('"')[1]
我明白了:
追踪(最近一次通话): 文件“C:/Users/sskadamb/PycharmProjects/vastDeleteLine/manipulation.py”,第11行,in PI = line.split('“')[1] IndexError:列表索引超出范围
我觉得它会PI = Smith,Andy
investigator = Smith,Andy
...为什么不会发生这种情况?
非常感谢任何帮助,谢谢!
答案 0 :(得分:1)
当你想到csv时,请考虑pandas,这是一个很棒的Python数据分析库。以下是如何实现您的目标:
import pandas as pd
fields = ['field{}'.format(i) for i in range(8)]
df = pd.read_csv("data.csv", header=None, names=fields)
df = df[df['field1'] != df['field3']]
print df
打印:
field0 field1 field2 field3 field4 field5 field6 field7
1 F04300621 Parker,Helen CERT Yu,Betty IOUS NaN NaN NaN
答案 1 :(得分:-2)
尝试拆分逗号,而不是qoute。
x.split( “”)