我尝试使用追加创建一个DataFrame:
col_stats= ['Attribute', 'Mean', 'Var', 'Std']
stats = pd.DataFrame(columns=[col_stats])
for i in train:
new_row = [
i,
train[i].mean(),
np.var(train[i]),
np.nanstd(train[i])
]
new_row = pd.Series(new_row)
stats = stats.append(new_row, ignore_index=True)
stats
当我消除这一行时它起作用:
stats = stats.append(new_row, ignore_index=True)
如果没有,它会给我这个错误:
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'
'属性' columns是一个字符串(变量的名称)。其他列(Mean,Var,Std)是数字(整数,浮点数)
为什么我不能在这里使用pd.df.append?
答案 0 :(得分:1)
对于要列出的循环解决方案append
行并使用DataFrame
构造函数:
L = []
for i in train:
new_row = [
i,
train[i].mean(),
np.var(train[i]),
np.nanstd(train[i])
]
L.append(new_row)
col_stats= ['Attribute', 'Mean', 'Var', 'Std']
stats = pd.DataFrame(L, columns=col_stats)
<强>示例强>:
train = pd.DataFrame({'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0]})
L = []
for i in train:
new_row = [
i,
train[i].mean(),
np.var(train[i]),
np.nanstd(train[i])
]
L.append(new_row)
col_stats= ['Attribute', 'Mean', 'Var', 'Std']
stats = pd.DataFrame(L, columns=col_stats)
print (stats)
Attribute Mean Var Std
0 B 4.500000 0.250000 0.500000
1 C 5.500000 6.916667 2.629956
2 D 2.833333 6.138889 2.477678
f1 = lambda x: x.var(ddof=0)
f2 = lambda x: x.std(ddof=0)
stats = train.agg(['mean',f1, f2]).T.reset_index()
stats.columns = ['Attribute', 'Mean', 'Var', 'Std']
print (stats)
Attribute Mean Var Std
0 B 4.500000 0.250000 0.500000
1 C 5.500000 6.916667 2.629956
2 D 2.833333 6.138889 2.477678