Question

这是我的示例数据。数据包含元组格式的ID，North，East和其他标头

ID,North,East,"(6640.83, 679.0)","(6648.84, 673.37)","(6649.83, 674.3)","(6647.0, 200.0)"
1,6642.83,679.37,2.0,8.4,8.6,479.38
1,6648.84,673.37,9.7,0.0,1.3,473.3
2,6649.83,674.3,10.1,1.3,1.4,474.3
2,6647.0,200.0,3.03,473.3,474.30,5.0

我的目标是检查每一行中除'ID','North' and 'East'之外的每一列数据，并查看谁具有最小值。当我找到最小值时，我想在属于该列的列表中写入这样的值（仅ID）

例如，第1行的最小值属于列"(6640.83, 679.0)" 然后，我想列出

6640.83_679.0 = [1] # here 1 value comes from the ID of that row.

，然后继续。例如，第4行再次具有属于"(6640.83, 679.0)"的最小值，那么我想使用相同的已创建列表并添加ID == 2，而不是创建属于该列的单独列表。例如。基本上，如果已经存在属于该列的列表，那么我不想再次创建另一个列表，但是如果以前没有创建属于该列的列表，那么我想创建该列表，以便我可以存储值，现在，上一个列表变成这样

6640.83_679.0 = [1, 2] # value corresponding to first rows and 4th rows of id of 1 and 2
6648.87_673.37 = [1] # value corresponding to second rows but has ID of 1
6649.83_674.3 = [2] # value corresponding to third rows and has ID of 2

我不想使用np.where并检查每列，因为可能需要检查的那些列可能超过50列。

是否有可能使用熊猫来实现这一目标。

Answer 1

我建议创建字典而不是列出：

#filter only tuples columns
df1 = df.iloc[:, 3:]
print (df1)
   (6640.83, 679.0)  (6648.84, 673.37)  (6649.83, 674.3)  (6647.0, 200.0)
0              2.00                8.4               8.6           479.38
1              9.70                0.0               1.3           473.30
2             10.10                1.3               1.4           474.30
3              3.03              473.3             474.3             5.00

#get positions by min values
s = pd.Series(df1.values.argmin(axis=1) + 1, index=df1.index)
print (s)
0    1
1    2
2    2
3    1
dtype: int64

#get column names (tuples) by min values
m = df1.idxmin(axis=1)
print (m)
0     (6640.83, 679.0)
1    (6648.84, 673.37)
2    (6648.84, 673.37)
3     (6640.83, 679.0)
dtype: object

#create dictioanry of lists
d = s.groupby(m).apply(list).to_dict()
print (d)
{'(6640.83, 679.0)': [1, 1], '(6648.84, 673.37)': [2, 2]}

#for select value of dict (if tuples omit '')
print (d['(6640.83, 679.0)'])

在pandas中检查它们之间的某些列的值，并列出属于该列的ID

1 个答案: