我正在尝试为我的数据框(combo
)中的列的子集计算zscores,然后在该数据框中为这些zscores创建新列。请注意,当zscores为pd.concat时,生成的新列都是NaN。这就是我需要帮助的问题。
我认为这可能与concat添加新列的方式有关,因为没有唯一的匹配键。但是当我试图在zcores中间表中保留电子邮件时,它并没有解决问题。所以它可能是别的东西。
zscores = combos.loc[:,pa_grade_cols].dropna(axis=0)
zscores = zscores.apply(zscore)
zscores = zscores.rename(lambda x:colrename(x, "zscore "), axis=1)
newcombo = pd.concat([combo, zscores], axis=1)
combo.iloc[4]:
email msilveira66@brandeis.edu
all pas 54.84
all partic 92.21
course 60.39
pa grade PA01 67.7
pa grade PA02 82
pa grade PA03 21
pa grade PA04 0
pa grade PA05 43
pa grade PA06 29
pa grade PA07 61
pa grade PA08 63
pa grade PA09 NaN
pa grade PA10 72
pa grade PA11 0
resub PA01 NaN
resub PA02 NaN
resub PA03 NaN
resub PA04 NaN
resub PA05 NaN
resub PA06 NaN
resub PA07 NaN
resub PA08 NaN
resub PA09 NaN
resub PA10 NaN
resub PA11 NaN
initial PA01 56
initial PA02 83.3333
initial PA03 30
initial PA04 0
initial PA05 61
initial PA06 42
initial PA07 80
initial PA08 90
initial PA09 NaN
initial PA10 97
initial PA11 0
resubmits 0
resub mean NaN
initial mean 53.9333
pa grade mean 43.87
Name: 4, dtype: object
zscores.iloc[4]:
zscore PA01 -0.562523
zscore PA02 -0.418858
zscore PA03 -1.722308
zscore PA04 -1.378762
zscore PA05 -2.291849
zscore PA06 -0.503729
zscore PA07 -0.343543
zscore PA08 -2.037249
zscore PA09 -0.064932
zscore PA10 -0.428859
zscore PA11 -0.735842
Name: 5, dtype: float64
newcombo:
email msilveira66@brandeis.edu
all pas 54.84
all partic 92.21
course 60.39
pa grade PA01 67.7
pa grade PA02 82
pa grade PA03 21
pa grade PA04 0
pa grade PA05 43
pa grade PA06 29
pa grade PA07 61
pa grade PA08 63
pa grade PA09 NaN
pa grade PA10 72
pa grade PA11 0
resub PA01 NaN
resub PA02 NaN
resub PA03 NaN
resub PA04 NaN
resub PA05 NaN
resub PA06 NaN
resub PA07 NaN
resub PA08 NaN
resub PA09 NaN
resub PA10 NaN
resub PA11 NaN
initial PA01 56
initial PA02 83.3333
initial PA03 30
initial PA04 0
initial PA05 61
initial PA06 42
initial PA07 80
initial PA08 90
initial PA09 NaN
initial PA10 97
initial PA11 0
resubmits 0
resub mean NaN
initial mean 53.9333
pa grade mean 43.87
zscore PA01 NaN
zscore PA02 NaN
zscore PA03 NaN
zscore PA04 NaN
zscore PA05 NaN
zscore PA06 NaN
zscore PA07 NaN
zscore PA08 NaN
zscore PA09 NaN
zscore PA10 NaN
zscore PA11 NaN
Name: 4, dtype: object
答案 0 :(得分:1)
这是预期的行为,因为dropna
会过滤掉NaN
s子集中的所有行,因此最后concat
只会添加已过滤的新行,而另一个值会转换为{{1} } S:
NaN
<强>详细强>:
combos = pd.DataFrame({'A':list('abcdef'),
'B':[np.nan,5,4,5,5,4],
'C':[7,8,9,np.nan,2,3],
'D':[1,3,5,np.nan,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (combos)
A B C D E F
0 a NaN 7.0 1.0 5 a
1 b 5.0 8.0 3.0 3 a
2 c 4.0 9.0 5.0 6 a
3 d 5.0 NaN NaN 9 b
4 e 5.0 2.0 1.0 2 b
5 f 4.0 3.0 0.0 4 b
#sample function
def zscore(x):
return x * 100
pa_grade_cols = ['B','C','D']
zscores = combos.loc[:,pa_grade_cols].dropna(axis=0)
zscores = zscores.apply(zscore)
zscores = zscores.add_prefix('zsores_')
newcombo = pd.concat([combos, zscores], axis=1)
print (newcombo)
A B C D E F zsores_B zsores_C zsores_D
0 a NaN 7.0 1.0 5 a NaN NaN NaN
1 b 5.0 8.0 3.0 3 a 500.0 800.0 300.0
2 c 4.0 9.0 5.0 6 a 400.0 900.0 500.0
3 d 5.0 NaN NaN 9 b NaN NaN NaN
4 e 5.0 2.0 1.0 2 b 500.0 200.0 100.0
5 f 4.0 3.0 0.0 4 b 400.0 300.0 0.0