Python Pandas-查找DataFrame行的所有唯一组合,而无需在列中重复值

时间:2019-05-13 03:33:17

标签: python pandas

我有一个与此相似的数据框

df = pd.DataFrame({'A': {0: 1, 1: 1, 2: 1, 3: 10, 4: 10, 5: 10, 6: 13, 7: 13, 8: 13},
 'B': {0: 17, 1: 20, 2: 25, 3: 17, 4: 20, 5: 25, 6: 17, 7: 20, 8: 25},
 'distance': {0: 304.0,
  1: 326.0,
  2: 426.0,
  3: 124.0,
  4: 146.0,
  5: 246.0,
  6: 69.0,
  7: 91.0,
  8: 191.0}})
    A  B      distance
0   1  17     304.0
1   1  20     326.0
2   1  25     426.0
3  10  17     124.0
4  10  20     146.0
5  10  25     246.0
6  13  17      69.0
7  13  20      91.0
8  13  25     191.0

我正在尝试获取数据框行的所有可能组合,而不重复A列和B列中的值。

我尝试遍历所有条目,但是随着行数的增加,效率很低。

我希望对于所有可能的组合,在最大行数的情况下,输出都是新的数据帧。例如:

A  B  distance
1  17 304.0
10 20 146.0
13 25 191.0

A  B  distance
1  20 326.0
10 17 124.0
13 25 191.0

另一个示例:

df = pd.DataFrame({'A': {0: 0,
  1: 0,
  2: 0,
  3: 2,
  4: 2,
  5: 2,
  6: 3,
  7: 3,
  8: 3,
  9: 5,
  10: 5,
  11: 5,
  12: 7,
  13: 7,
  14: 7,
  15: 9,
  16: 9,
  17: 9,
  18: 12,
  19: 12,
  20: 12,
  21: 14,
  22: 14,
  23: 14,
  24: 15,
  25: 15,
  26: 15,
  27: 18,
  28: 18},
 'B': {0: 17,
  1: 20,
  2: 25,
  3: 17,
  4: 20,
  5: 25,
  6: 17,
  7: 20,
  8: 25,
  9: 17,
  10: 20,
  11: 25,
  12: 17,
  13: 20,
  14: 25,
  15: 17,
  16: 20,
  17: 25,
  18: 17,
  19: 20,
  20: 25,
  21: 17,
  22: 20,
  23: 25,
  24: 17,
  25: 20,
  26: 25,
  27: 20,
  28: 25},
 'distance': {0: 408.0,
  1: 430.0,
  2: 530.0,
  3: 293.0,
  4: 315.0,
  5: 415.0,
  6: 281.0,
  7: 303.0,
  8: 403.0,
  9: 242.0,
  10: 264.0,
  11: 364.0,
  12: 208.0,
  13: 230.0,
  14: 330.0,
  15: 170.0,
  16: 192.0,
  17: 292.0,
  18: 74.0,
  19: 96.0,
  20: 196.0,
  21: 48.0,
  22: 70.0,
  23: 170.0,
  24: 27.0,
  25: 49.0,
  26: 149.0,
  27: 17.0,
  28: 117.0}})

Out[377]: 
     A   C  distance
0    0  17     408.0
1    0  20     430.0
2    0  25     530.0
3    2  17     293.0
4    2  20     315.0
5    2  25     415.0
6    3  17     281.0
7    3  20     303.0
8    3  25     403.0
9    5  17     242.0
10   5  20     264.0
11   5  25     364.0
12   7  17     208.0
13   7  20     230.0
14   7  25     330.0
15   9  17     170.0
16   9  20     192.0
17   9  25     292.0
18  12  17      74.0
19  12  20      96.0
20  12  25     196.0
21  14  17      48.0
22  14  20      70.0
23  14  25     170.0
24  15  17      27.0
25  15  20      49.0
26  15  25     149.0
27  18  20      17.0
28  18  25     117.0

预期的输出(样本)

A  B  distance
0  17 408.0
2  20 315.0
3  25 403.0

A  B  distance
0  20 430.0
2  17 293.0
3  25 403.0


A  B  distance
0  25 530.0
2  17 293.0
3  20 303.0


A  B  distance
0  25 530.0
2  17 293.0
5  20 264.0
.
.
.

1 个答案:

答案 0 :(得分:3)

我认为您可能需要使用child: new Text(posts[index]["title"]['rendered'], style: TextStyle( fontSize: 21, fontWeight: FontWeight.bold, ), textAlign: TextAlign.center), ), new Padding( padding: EdgeInsets.all(10.0), child: new ListTile( subtitle: new Text(posts[index]["excerpt"] ["rendered"] .replaceAll(new RegExp(r'<[^>]*>'), '')), ), ), 中的permutations,然后我们只需要在itertools之后查找df

pivot

更新

l=list(itertools.permutations([0,1,2]))
s=df.pivot(*df.columns)
list_of_df=[pd.DataFrame({'A':s.index,
                          'B':s.columns.values[list(x)],
                          'distance':s.values[np.arange(len(s)),x]}) for x in l ]
list_of_df[0]
Out[725]: 
    A   B  distance
0   1  17     304.0
1  10  20     146.0
2  13  25     191.0
list_of_df[1]
Out[726]: 
    A   B  distance
0   1  17     304.0
1  10  25     246.0
2  13  20      91.0