我的目标是通过从其他数据框中随机抽样来构建数据框,收集有关新数据框的摘要统计信息,然后将这些统计信息附加到列表中。理想情况下,我可以多次遍历此过程(例如bootstrap)。
dfposlist = [OFdf, Firstdf, Seconddf, Thirddf, CFdf, RFdf, Cdf, SSdf]
OFdf.head()
playerID OPW POS salary
87 bondsba01 62.061290 OF 8541667
785 ramirma02 35.785630 OF 13050000
966 walkela01 30.644305 OF 6050000
859 sheffga01 29.090699 OF 9916667
357 gilesbr02 28.160054 OF 7666666
列表中的所有数据帧都具有相同的标题。我试图做的事情看起来像这样:
teamdist = []
for df in dfposlist:
frames = [df.sample(n=1)]
team = pd.concat(frames)
teamopw = team['OPW'].sum()
teamsal = team['salary'].sum()
teamplayers = team['playerID'].tolist()
teamdic = {'Salary':teamsal, 'OPW':teamopw, 'Players':teamplayers}
teamdist.append(teamdic)
我正在寻找的输出是这样的:
teamdist = [{'Salary':4900000, 'OPW':78.452, 'Players':[bondsba01, etc, etc]}]
但由于某种原因,teamopw = team['OPW'].sum()
之类的总和操作无效,我只想返回team['OPW']
print(teamopw)
0.17118131814601256
38.10700006434629
1.5699939126695253
32.9068837019903
16.990760776263674
18.22428871113601
13.447706356730897
有关如何使这项工作的任何建议?谢谢!
编辑:工作解决方案如下。不确定它是否是最pythonic的方式,但它的工作原理。
teamdist = []
team = pd.concat([df.sample(n=1) for df in dfposlist])
teamopw = team[['OPW']].values.sum()
teamsal = team[['salary']].values.sum()
teamplayers = team['playerID'].tolist()
teamdic = {'Salary':teamsal, 'OPW':teamopw, 'Players':teamplayers}
teamdist.append(teamdic)
答案 0 :(得分:2)
这里(随机数据):
import pandas as pd
import numpy as np
dfposlist = dict(zip(range(10),
[pd.DataFrame(np.random.randn(10, 5),
columns=list('abcde'))
for i in range(10)]))
for df in dfposlist.values():
df['f'] = list('qrstuvwxyz')
teamdist = []
team = pd.concat([df.sample(n=1) for df in dfposlist.values()])
print(team.info())
teamdic = team[['a', 'c', 'e']].sum().to_dict()
teamdic['f'] = team['f'].tolist()
teamdist.append(teamdic)
print(teamdist)
# Output:
## team.info():
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 1 to 6
Data columns (total 6 columns):
a 10 non-null float64
b 10 non-null float64
c 10 non-null float64
d 10 non-null float64
e 10 non-null float64
f 10 non-null object
dtypes: float64(5), object(1)
memory usage: 560.0+ bytes
None
## teamdist:
[{'a': -3.5380097363724601,
'c': 2.0951152809401776,
'e': 3.1439230427971863,
'f': ['r', 'w', 'z', 'v', 'x', 'q', 't', 'q', 'v', 'w']}]