Question

我有一个pandas.DataFrame，其中包含3列类型str和n其他类型为float64的列。

我需要按三个str列之一对行进行分组，并应用一个函数myComplexFunc()，这会将'N行减少到一行。

myComplexFunc()仅采用float64类型的行。

这可以通过一些for循环来完成，但效率不高，所以我尝试使用pandas的{{3}}，但它似乎运行了myComplexFunc()的繁重代码两次！

更清楚，这是一个最小的例子

让“df”成为像这样的数据框：

df
>>
     A      B         C         D
0  foo    one  0.406157  0.735223
1  bar    one  1.020493 -1.167256
2  foo    two -0.314192 -0.883087
3  bar  three  0.271705 -0.215049
4  foo    two  0.535290  0.185872
5  bar    two  0.178926 -0.459890
6  foo    one -1.939673 -0.523396
7  foo  three -2.125591 -0.689809

myComplexFunc（）

def myComplexFunc(rows):
  # Some transformations that will return 1 row
  result = some_transformations(rows)
  return result

我想要的是什么：

# wanted apply is the name of the wanted method
df.groupby("A").wanted_apply(myComplexFunc)

>> 
    A    C            D
0  foo   new_c0_foo   new_d0_foo
1  bar   new_c0_bar   new_d0_bar

列B已被删除，因为它不属于float64类型。

提前致谢

Answer 1

您可以按dtype Series按select_dtypes过滤DataFrame，但需要df.A def myComplexFunc(rows): return rows + 10 df = df.select_dtypes(include=[np.float64]).groupby([df.A]).apply(myComplexFunc) print (df) C D 0 10.406157 10.735223 1 11.020493 8.832744 2 9.685808 9.116913 3 10.271705 9.784951 4 10.535290 10.185872 5 10.178926 9.540110 6 8.060327 9.476604 7 7.874409 9.310191进行聚合：

因为只使用df = df.select_dtypes(include=[np.float64]).groupby('A').apply(myComplexFunc)：

获取

KeyError：'A'

并且它是正确的 - 排除了所有字符串列（B和print (df.select_dtypes(include=[np.float64])) C D 0 0.406157 0.735223 1 1.020493 -1.167256 2 -0.314192 -0.883087 3 0.271705 -0.215049 4 0.535290 0.185872 5 0.178926 -0.459890 6 -1.939673 -0.523396 7 -2.125591 -0.689809）。

remove()

Pandas DataFrame，将复杂函数智能应用于groupby结果

1 个答案: