我正在尝试遍历三个数据帧以发现它们之间的差异。我有一个包含所有内容的主数据帧和另外两个包含部分主数据帧的数据帧。我正在尝试编写python代码来识别其他两个文件中缺少的内容。主文件如下所示:
ID Name
1 Mike
2 Dani
3 Scott
4 Josh
5 Nate
6 Sandy
第二个数据帧如下所示:
ID Name
1 Mike
2 Dani
3 Scott
6 Sandy
第三数据框如下:
ID Name
1 Mike
2 Dani
3 Scott
4 Josh
5 Nate
因此将有两个输出数据帧。所需的输出如下所示,用于第二个数据帧:
ID Name
4 Josh
5 Nate
第三个数据帧所需的输出如下:
ID Name
6 Sandy
我在Google上找不到任何类似的内容。我尝试过:
for i in second['ID'], third['ID']:
if i not in master['ID']:
print(i)
它返回主文件中的所有数据。
另外,如果我尝试此代码:
import pandas as pd
names = ["Mike", "Dani", "Scott", "Josh", "Nate", "Sandy"]
ids = [1, 2, 3, 4, 5, 6]
master = pd.DataFrame({"ID": ids, "Name": names})
# print(master)
names_second = ["Mike", "Dani", "Scott", "Sandy"]
ids_second = [1, 2, 3, 6]
second = pd.DataFrame({"ID": ids_second, "Name": names_second})
# print(second)
names_third = ["Mike", "Dani", "Scott", "Josh", "Nate"]
ids_third = [1, 2, 3, 4, 5]
third = pd.DataFrame({"ID": ids_third, "Name": names_third})
# print(third)
for i in master['ID']:
if i not in second["ID"]:
print("NOT IN SECOND", i)
if i not in third["ID"]:
print("NOT IN THIRD", i)
输出::
NOT IN SECOND 4
NOT IN SECOND 5
NOT IN THIRD 5
NOT IN SECOND 6
NOT IN THIRD 6
为什么说NOT IN SECOND 6
和NOT IN THIRD 5
?
有什么建议吗?预先感谢。
答案 0 :(得分:2)
您可以尝试将.isin
与~
结合使用来过滤dataframes
。要与第二个进行比较,您可以使用master[~master.ID.isin(second.ID)]
和类似的作为第三个:
cmp_master_second, cmp_master_third = master[~master.ID.isin(second.ID)], master[~master.ID.isin(third.ID)]
print(cmp_master_second)
print('\n-------- Seperate dataframes -----------\n')
print(cmp_master_third)
结果:
Name
ID
4 Josh
5 Nate
-------- Seperate dataframes -----------
Name
ID
6 Sandy
答案 1 :(得分:0)
您可以在主服务器和其他set
主机上进行DataFrame
的操作
In [315]: set(d1[0]) - set(d2[0])
Out[315]: {'Josh', 'Nate'}
In [316]: set(d1[0]) - set(d3[0])
Out[316]: {'Sandy'}