遍历pandas数据框,使用if语句检查每个列值,并将该列值传递到空df的首选列

时间:2019-01-08 14:11:05

标签: python pandas

我有一个这样的数据框

    Event_Id       Investigation_Type Accident_Number   Event_Date      
0   20180922X71035  ACCIDENT          DCA18CA289        09/10/2018  
1   20180507X00658  ACCIDENT          DCA18CA169        05/07/2018  
4   20171212X50255  ACCIDENT          DCA18CA043B       12/03/2017

然后我尝试像这样遍历它...

n1col = 0
n2col = 1

for i in df.index:
    Node1=df.Event_Id
    for j in df.index:
        Node2=df.Event_Id
        if (Node1 != Node2):
            new_df.loc[j,n1col] = Node1
            new_df.loc[j,n2col] = Node2

我不知道我的方法是否正确(如我所见),我需要一些帮助,以便获得如下结果...

我是这种东西的新手,所以我需要您的帮助。

    Node_1          Node_2 
0   20180922X71035  20180507X00658
1   20180922X71035  20171212X50255
2   20180507X00658  20180922X71035
3   20180507X00658  20171212X50255
4   20171212X50255  20180922X71035
6   20171212X50255  20180507X00658

谢谢。

2 个答案:

答案 0 :(得分:2)

我知道您已经接受了答案。但是,如果您不是在寻找组合,而是想要笛卡尔乘积,然后对其进行过滤,以使两列不相等...

>>> df
         Event_Id Accident_Number  Event_Date Investigation_Type
0  20180922X71035      DCA18CA289  09/10/2018           ACCIDENT
1  20180507X00658      DCA18CA169  05/07/2018           ACCIDENT
2  20171212X50255     DCA18CA043B  12/03/2017           ACCIDENT

获取笛卡尔积Answer from this other StackOverflow post

>>> df['key'] = 0
>>> df
         Event_Id Accident_Number  Event_Date Investigation_Type  key
0  20180922X71035      DCA18CA289  09/10/2018           ACCIDENT    0
1  20180507X00658      DCA18CA169  05/07/2018           ACCIDENT    0
2  20171212X50255     DCA18CA043B  12/03/2017           ACCIDENT    0
>>> df2 = df.merge(df, on='key').filter(items=['Event_Id_x', 'Event_Id_y'])
>>> df2
       Event_Id_x      Event_Id_y
0  20180922X71035  20180922X71035
1  20180922X71035  20180507X00658
2  20180922X71035  20171212X50255
3  20180507X00658  20180922X71035
4  20180507X00658  20180507X00658
5  20180507X00658  20171212X50255
6  20171212X50255  20180922X71035
7  20171212X50255  20180507X00658
8  20171212X50255  20171212X50255

使用.loc/boolean indexing过滤您的DataFrame

>>> df2.loc[df2['Event_Id_x'] != df2['Event_Id_y']]
       Event_Id_x      Event_Id_y
1  20180922X71035  20180507X00658
2  20180922X71035  20171212X50255
3  20180507X00658  20180922X71035
5  20180507X00658  20171212X50255
6  20171212X50255  20180922X71035
7  20171212X50255  20180507X00658

类似于Josh使用itertools的答案。但是这次使用产品而不是组合:

>>> df = df.set_index('Event_Id')
>>> df3 = pd.DataFrame(list(product(df.index.tolist(), df.index.tolist())), columns=['Node1', 'Node2'])
>>> df3.loc[df3['Node1'] != df3['Node2']]
            Node1           Node2
1  20180922X71035  20180507X00658
2  20180922X71035  20171212X50255
3  20180507X00658  20180922X71035
5  20180507X00658  20171212X50255
6  20171212X50255  20180922X71035
7  20171212X50255  20180507X00658

答案 1 :(得分:1)

您可以使用

一行完成此操作
from itertools import combinations

pd.DataFrame(list(combinations(df.index.tolist(), 2)), columns=['Node1', 'Node2'])