使用append创建数据框

时间:2018-04-15 17:43:17

标签: python pandas loops dataframe append

我尝试使用追加创建一个DataFrame:

col_stats= ['Attribute', 'Mean', 'Var', 'Std']
stats = pd.DataFrame(columns=[col_stats])

for i in train:
    new_row = [
        i,
        train[i].mean(),
        np.var(train[i]),
        np.nanstd(train[i])
    ]
    new_row = pd.Series(new_row)
    stats = stats.append(new_row, ignore_index=True)

stats

当我消除这一行时它起作用:

    stats = stats.append(new_row, ignore_index=True)

如果没有,它会给我这个错误:

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'

'属性' columns是一个字符串(变量的名称)。其他列(Mean,Var,Std)是数字(整数,浮点数)

为什么我不能在这里使用pd.df.append?

1 个答案:

答案 0 :(得分:1)

对于要列出的循环解决方案append行并使用DataFrame构造函数:

L = []
for i in train:
    new_row = [
        i,
        train[i].mean(),
        np.var(train[i]),
        np.nanstd(train[i])
    ]
    L.append(new_row)

col_stats= ['Attribute', 'Mean', 'Var', 'Std']
stats = pd.DataFrame(L, columns=col_stats)

<强>示例

train = pd.DataFrame({'B':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0]})

L = []
for i in train:
    new_row = [
        i,
        train[i].mean(),
        np.var(train[i]),
        np.nanstd(train[i])
    ]
    L.append(new_row)

col_stats= ['Attribute', 'Mean', 'Var', 'Std']
stats = pd.DataFrame(L, columns=col_stats)

print (stats)
  Attribute      Mean       Var       Std
0         B  4.500000  0.250000  0.500000
1         C  5.500000  6.916667  2.629956
2         D  2.833333  6.138889  2.477678
f1 = lambda x: x.var(ddof=0)
f2 = lambda x: x.std(ddof=0)
stats = train.agg(['mean',f1, f2]).T.reset_index()
stats.columns = ['Attribute', 'Mean', 'Var', 'Std']
print (stats)
  Attribute      Mean       Var       Std
0         B  4.500000  0.250000  0.500000
1         C  5.500000  6.916667  2.629956
2         D  2.833333  6.138889  2.477678