pandas数据框中是否等同于在R data.table中使用'by'?
例如在R中我可以这样做:
DT = data.table(x = c('a', 'a', 'a', 'b', 'b', 'b'), y = rnorm(6))
DT[, z := mean(y[1:2]), by = x]
熊猫中有类似的东西吗?
答案 0 :(得分:5)
如果我们需要获得与data.table
中类似的输出,我们想要获取' y'的第一个元素。按' x'分组并创建一个新列' z',然后
mean1 = lambda x: x.head(2).mean()
df['z'] = df['y'].groupby(df['x']).transform(mean1)
print(df)
# x y z
#0 a 1.329212 0.279589
#1 a -0.770033 0.279589
#2 a -0.316280 0.279589
#3 b -0.990810 -1.030813
#4 b -1.070816 -1.030813
#5 b -1.438713 -1.030813
在data.table
R
的代码
library(data.table)
DT[, z := mean(y[1:2]), by = x]
DT
# x y z
#1: a 1.329212 0.2795895
#2: a -0.770033 0.2795895
#3: a -0.316280 0.2795895
#4: b -0.990810 -1.0308130
#5: b -1.070816 -1.0308130
#6: b -1.438713 -1.0308130
import pandas as pd
import numpy as np
from numpy import random
np.random.seed(seed=24)
df = pd.DataFrame({'x': ['a', 'a', 'a', 'b', 'b', 'b'],
'y': random.randn(6)})
DT <- structure(list(x = c("a", "a", "a", "b", "b", "b"),
y = c(1.329212,
-0.770033, -0.31628, -0.99081, -1.070816, -1.438713)), .Names = c("x",
"y"), class = c("data.table", "data.frame"),
row.names = c(NA, -6L))