假设我有以下熊猫数据框
import pandas as pd
import numpy as np
df= pd.DataFrame(np.nan, columns =["A","B","C"], index =np.arange(5))
df=df.astype(object)
for c in list(df):
for i in df.index.values:
df.at[i, c]=np.arange(5).tolist()
这会导致df的单元格为numpy数组
df
Out[16]:
A B C
0 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
1 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
2 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
3 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
4 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
我想计算数据帧的平均值,但由于每个单元格都被视为字符串,因此它不起作用。例如,
type(df.loc[0][0])
Out[19]: list
因此,如果我计算其平均值,它将返回nan
df["Average"]= df.mean(axis=1)
df
Out[21]:
A B C Average
0 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] NaN
1 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] NaN
2 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] NaN
3 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] NaN
4 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] NaN
我的问题是,如何将df转换回可以使用的数值?
答案 0 :(得分:1)
我认为将值转换为列的想法真的很好,因为然后可以使用pandas向量化函数:
df1 = pd.concat([pd.DataFrame(df[c].values.tolist()) for c in df.columns],
axis=1,
keys=df.columns)
df1.columns = ['{}{}'.format(i, j) for i, j in df1.columns]
print (df1)
A0 A1 A2 A3 A4 B0 B1 B2 B3 B4 C0 C1 C2 C3 C4
0 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
1 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
2 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
3 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
但是如果同时需要所有列表中的mean
:
df= pd.DataFrame(np.nan, columns =["A","B","C"], index =np.arange(5))
df=df.astype(object)
for c in list(df):
for i in df.index.values:
df.at[i, c]=np.arange(i+1).tolist()
print (df)
A B C
0 [0] [0] [0]
1 [0, 1] [0, 1] [0, 1]
2 [0, 1, 2] [0, 1, 2] [0, 1, 2]
3 [0, 1, 2, 3] [0, 1, 2, 3] [0, 1, 2, 3]
4 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
from itertools import chain
from statistics import mean
df['Average'] = [mean(list(chain.from_iterable(x))) for x in df.values.tolist()]
print (df)
A B C Average
0 [0] [0] [0] 0.0
1 [0, 1] [0, 1] [0, 1] 0.5
2 [0, 1, 2] [0, 1, 2] [0, 1, 2] 1.0
3 [0, 1, 2, 3] [0, 1, 2, 3] [0, 1, 2, 3] 1.5
4 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] 2.0
编辑:
如果值是字符串:
df= pd.DataFrame(np.nan, columns =["A","B","C"], index =np.arange(5))
df=df.astype(object)
for c in list(df):
for i in df.index.values:
df.at[i, c]=np.arange(5).tolist()
df=df.astype(str)
print (df)
A B C
0 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
1 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
2 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
3 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
4 [0, 1, 2, 3, 4] [0, 1, 2, 3, 4] [0, 1, 2, 3, 4]
df1 = pd.concat([df[c].str.strip('[]').str.split(', ', expand=True) for c in df.columns],
axis=1,
keys=df.columns).astype(float)
df1.columns = ['{}{}'.format(i, j) for i, j in df1.columns]
df1["Average"]= df1.mean(axis=1)
print (df1)
A0 A1 A2 A3 A4 B0 B1 B2 B3 B4 C0 C1 C2 C3 C4 \
0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0
1 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0
2 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0
3 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0
4 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0
Average
0 2.0
1 2.0
2 2.0
3 2.0
4 2.0
答案 1 :(得分:1)
您可能要按照上述说明重组数据框。但是,要想使用已有的功能,假设您想要数据框中每个元素的均值,可以尝试使用applymap
方法。
df.applymap(np.mean)