这是我输入的csv文件:
column1 column2
abc city town efg town
abc town city efg city
efg town abc city town
efg city abc town city
如果我的csv文件包含:
1)仅当城市连续存在时,才应删除城市
2)仅当城镇连续存在时,才应删除城镇
3)如果存在城市城镇,则应仅删除城市
4)城镇(如果存在)应该仅删除城市
我想要的输出应如下所示:
column1 column2
abc city efg
abc town efg
efg abc city
efg abc town
我正在尝试使用Python。到目前为止,这是我尝试过的:
import pandas as pd
df = {"A": ['abc town', "abc city", 'abc town city', "abc city town"]}
for i in df['A']:
... if i == 'town':
... df['b'] == 'yes'
... print (df)
如果行中仅包含城市或城镇,我必须将其删除。我知道有一个概念包含所有内容,我可以在其中使用它,但不确定如何应用。
答案 0 :(得分:0)
这不使用Pandas模块。但我相信它可以满足您的需求。不过,可能是完成这项任务的更短的方法。
import csv
filename = 'file location and name'
with open(filename, 'r') as f:
reader = csv.reader(f)
data = list(reader)
list1 = []
for x in data:
for i in x:
if i.count(' ') > 1:
i = ' '.join(i.split(' ', 2)[:2])
list1.append(i)
else:
i = i.split(' ')[0]
list1.append(i)
list2 = list1[::2]
list3 = list1[1::2]
zipped_list = zip(list2,list3)
headers = ['header1', 'header2']
with open("output.csv","w",newline="") as csv_save:
cw = csv.writer(csv_save)
cw.writerow(headers)
cw.writerows(zipped_list)
答案 1 :(得分:0)
这是我尝试过的简单解决方案,
df['column1'] = df['column1'].str.replace(r'town$','')
df['column1'] = df['column1'].str.replace(r'city$','')
第2列也是如此
df['column2'] = df['column2'].str.replace(r'town$','')
df['column2'] = df['column2'].str.replace(r'city$','')
输出将如下所示,
column1 column2
abc city efg
abc town efg
efg abc city
efg abc town