我有一个这样的数据框
Event_Id Investigation_Type Accident_Number Event_Date
0 20180922X71035 ACCIDENT DCA18CA289 09/10/2018
1 20180507X00658 ACCIDENT DCA18CA169 05/07/2018
4 20171212X50255 ACCIDENT DCA18CA043B 12/03/2017
然后我尝试像这样遍历它...
n1col = 0
n2col = 1
for i in df.index:
Node1=df.Event_Id
for j in df.index:
Node2=df.Event_Id
if (Node1 != Node2):
new_df.loc[j,n1col] = Node1
new_df.loc[j,n2col] = Node2
我不知道我的方法是否正确(如我所见),我需要一些帮助,以便获得如下结果...
我是这种东西的新手,所以我需要您的帮助。
Node_1 Node_2
0 20180922X71035 20180507X00658
1 20180922X71035 20171212X50255
2 20180507X00658 20180922X71035
3 20180507X00658 20171212X50255
4 20171212X50255 20180922X71035
6 20171212X50255 20180507X00658
谢谢。
答案 0 :(得分:2)
我知道您已经接受了答案。但是,如果您不是在寻找组合,而是想要笛卡尔乘积,然后对其进行过滤,以使两列不相等...
>>> df
Event_Id Accident_Number Event_Date Investigation_Type
0 20180922X71035 DCA18CA289 09/10/2018 ACCIDENT
1 20180507X00658 DCA18CA169 05/07/2018 ACCIDENT
2 20171212X50255 DCA18CA043B 12/03/2017 ACCIDENT
获取笛卡尔积Answer from this other StackOverflow post
>>> df['key'] = 0
>>> df
Event_Id Accident_Number Event_Date Investigation_Type key
0 20180922X71035 DCA18CA289 09/10/2018 ACCIDENT 0
1 20180507X00658 DCA18CA169 05/07/2018 ACCIDENT 0
2 20171212X50255 DCA18CA043B 12/03/2017 ACCIDENT 0
>>> df2 = df.merge(df, on='key').filter(items=['Event_Id_x', 'Event_Id_y'])
>>> df2
Event_Id_x Event_Id_y
0 20180922X71035 20180922X71035
1 20180922X71035 20180507X00658
2 20180922X71035 20171212X50255
3 20180507X00658 20180922X71035
4 20180507X00658 20180507X00658
5 20180507X00658 20171212X50255
6 20171212X50255 20180922X71035
7 20171212X50255 20180507X00658
8 20171212X50255 20171212X50255
使用.loc/boolean indexing过滤您的DataFrame
>>> df2.loc[df2['Event_Id_x'] != df2['Event_Id_y']]
Event_Id_x Event_Id_y
1 20180922X71035 20180507X00658
2 20180922X71035 20171212X50255
3 20180507X00658 20180922X71035
5 20180507X00658 20171212X50255
6 20171212X50255 20180922X71035
7 20171212X50255 20180507X00658
类似于Josh使用itertools的答案。但是这次使用产品而不是组合:
>>> df = df.set_index('Event_Id')
>>> df3 = pd.DataFrame(list(product(df.index.tolist(), df.index.tolist())), columns=['Node1', 'Node2'])
>>> df3.loc[df3['Node1'] != df3['Node2']]
Node1 Node2
1 20180922X71035 20180507X00658
2 20180922X71035 20171212X50255
3 20180507X00658 20180922X71035
5 20180507X00658 20171212X50255
6 20171212X50255 20180922X71035
7 20171212X50255 20180507X00658
答案 1 :(得分:1)
您可以使用
一行完成此操作from itertools import combinations
pd.DataFrame(list(combinations(df.index.tolist(), 2)), columns=['Node1', 'Node2'])