我有两个tsv文件如下。
tsv档案编号1
id ingredients recipe
code1 egg, butter beat eggs. add butter
code2 tim tam, butter beat tim tam. add butter
code3 coffee, sugar add coffee and sugar and mix
code4 sugar, milk beat sugar and milk together
tsv档案编号2
id ingredients recipe
c009 apple, milk add apples to milk
c110 coffee, sugar add coffee and sugar and mix
c111 egg, butter add egg, butter and sugar
c112 tim tam, sugar beat tim tam. add butter
我想删除tsv文件中的条目,如果,
在上面的示例中,两个tsv文件的输出应如下所示。
tsv档案编号1
id ingredients recipe
code4 sugar, milk beat sugar and milk together
tsv档案编号2
id ingredients recipe
c009 apple, milk add apples to milk
我们可以用熊猫这样做吗?请帮帮我!
答案 0 :(得分:1)
您可以阅读正在使用的tsv文件pd.read_csv
:
df1 = pd.read_csv(tsv_file_1, sep='\s\s+')
df2 = pd.read_csv(tsv_file_2, sep='\s\s+')
#Deal with spaces in columns names
df1.columns = df1.columns.str.strip()
df2.columns = df2.columns.str.strip()
接下来使用isin
和~
(非运营商):
df1_new = df1[~df1.ingredients.isin(df2.ingredients)]
df2_new = df2[~df2.ingredients.isin(df1.ingredients)]
print(df1_new)
id ingredients recipe
3 code4 sugar, milk beat sugar and milk together
print(df2_new)
id ingredients recipe
0 c009 apple, milk add apples to milk