合并在Pandas中表现不尽如人意

时间:2018-05-12 17:57:25

标签: pandas

我正在尝试为我的数据框(combo)中的列的子集计算zscores,然后在该数据框中为这些zscores创建新列。请注意,当zscores为pd.concat时,生成的新列都是NaN。这就是我需要帮助的问题。

我认为这可能与concat添加新列的方式有关,因为没有唯一的匹配键。但是当我试图在zcores中间表中保留电子邮件时,它并没有解决问题。所以它可能是别的东西。

zscores = combos.loc[:,pa_grade_cols].dropna(axis=0)
zscores = zscores.apply(zscore)
zscores = zscores.rename(lambda x:colrename(x, "zscore "), axis=1)
newcombo = pd.concat([combo, zscores], axis=1)

combo.iloc[4]: 

email            msilveira66@brandeis.edu
all pas                             54.84
all partic                          92.21
course                              60.39
pa grade PA01                        67.7
pa grade PA02                          82
pa grade PA03                          21
pa grade PA04                           0
pa grade PA05                          43
pa grade PA06                          29
pa grade PA07                          61
pa grade PA08                          63
pa grade PA09                         NaN
pa grade PA10                          72
pa grade PA11                           0
resub PA01                            NaN
resub PA02                            NaN
resub PA03                            NaN
resub PA04                            NaN
resub PA05                            NaN
resub PA06                            NaN
resub PA07                            NaN
resub PA08                            NaN
resub PA09                            NaN
resub PA10                            NaN
resub PA11                            NaN
initial PA01                           56
initial PA02                      83.3333
initial PA03                           30
initial PA04                            0
initial PA05                           61
initial PA06                           42
initial PA07                           80
initial PA08                           90
initial PA09                          NaN
initial PA10                           97
initial PA11                            0
resubmits                               0
resub mean                            NaN
initial mean                      53.9333
pa grade mean                       43.87
Name: 4, dtype: object

zscores.iloc[4]:

zscore PA01   -0.562523
zscore PA02   -0.418858
zscore PA03   -1.722308
zscore PA04   -1.378762
zscore PA05   -2.291849
zscore PA06   -0.503729
zscore PA07   -0.343543
zscore PA08   -2.037249
zscore PA09   -0.064932
zscore PA10   -0.428859
zscore PA11   -0.735842
Name: 5, dtype: float64

newcombo:

email            msilveira66@brandeis.edu
all pas                             54.84
all partic                          92.21
course                              60.39
pa grade PA01                        67.7
pa grade PA02                          82
pa grade PA03                          21
pa grade PA04                           0
pa grade PA05                          43
pa grade PA06                          29
pa grade PA07                          61
pa grade PA08                          63
pa grade PA09                         NaN
pa grade PA10                          72
pa grade PA11                           0
resub PA01                            NaN
resub PA02                            NaN
resub PA03                            NaN
resub PA04                            NaN
resub PA05                            NaN
resub PA06                            NaN
resub PA07                            NaN
resub PA08                            NaN
resub PA09                            NaN
resub PA10                            NaN
resub PA11                            NaN
initial PA01                           56
initial PA02                      83.3333
initial PA03                           30
initial PA04                            0
initial PA05                           61
initial PA06                           42
initial PA07                           80
initial PA08                           90
initial PA09                          NaN
initial PA10                           97
initial PA11                            0
resubmits                               0
resub mean                            NaN
initial mean                      53.9333
pa grade mean                       43.87
zscore PA01                           NaN
zscore PA02                           NaN
zscore PA03                           NaN
zscore PA04                           NaN
zscore PA05                           NaN
zscore PA06                           NaN
zscore PA07                           NaN
zscore PA08                           NaN
zscore PA09                           NaN
zscore PA10                           NaN
zscore PA11                           NaN
Name: 4, dtype: object

1 个答案:

答案 0 :(得分:1)

这是预期的行为,因为dropna会过滤掉NaN s子集中的所有行,因此最后concat只会添加已过滤的新行,而另一个值会转换为{{1} } S:

NaN

<强>详细

combos = pd.DataFrame({'A':list('abcdef'),
                   'B':[np.nan,5,4,5,5,4],
                   'C':[7,8,9,np.nan,2,3],
                   'D':[1,3,5,np.nan,1,0],
                   'E':[5,3,6,9,2,4],
                   'F':list('aaabbb')})

print (combos)
   A    B    C    D  E  F
0  a  NaN  7.0  1.0  5  a
1  b  5.0  8.0  3.0  3  a
2  c  4.0  9.0  5.0  6  a
3  d  5.0  NaN  NaN  9  b
4  e  5.0  2.0  1.0  2  b
5  f  4.0  3.0  0.0  4  b

#sample function
def zscore(x):
    return x * 100

pa_grade_cols = ['B','C','D']
zscores = combos.loc[:,pa_grade_cols].dropna(axis=0)
zscores = zscores.apply(zscore)
zscores = zscores.add_prefix('zsores_')
newcombo = pd.concat([combos, zscores], axis=1)
print (newcombo)
   A    B    C    D  E  F  zsores_B  zsores_C  zsores_D
0  a  NaN  7.0  1.0  5  a       NaN       NaN       NaN
1  b  5.0  8.0  3.0  3  a     500.0     800.0     300.0
2  c  4.0  9.0  5.0  6  a     400.0     900.0     500.0
3  d  5.0  NaN  NaN  9  b       NaN       NaN       NaN
4  e  5.0  2.0  1.0  2  b     500.0     200.0     100.0
5  f  4.0  3.0  0.0  4  b     400.0     300.0       0.0