对于我的任务,我需要将棒球工资数据导入到大熊猫DataFrame
中
从那里,我的目标之一是每年获得所有球队的薪水。
我成功了但是为了进入下一个任务,我需要一只熊猫DataFrame
。 sumofSalaries.dtype
正在返回int64
。
问题:
1.如何将以下代码中的数据转换为DataFrame?
2.如何删除sumofSalaries
中的索引?
代码:
import pandas as pd
salariesData = pd.read_csv('Salaries.csv')
#sum salaries by year and team
sumOfSalaries = salariesData.groupby(by=['yearID','teamID'])['salary'].sum()
del sumOfSalaries.index.names #line giving me errors
#create DataFrame from grouped data
df = pd.DataFrame(sumOfSalaries, columns = ['yearID', 'teamID', 'salary'])
df
_____________________________________________________________________________
sumofSalaries:
yearID teamID
1985 ATL 14807000
BAL 11560712
BOS 10897560
CAL 14427894
CHA 9846178
...and so on
_____________________________________________________________________________
df:
yearID teamID salary
yearID teamID
1985 ATL NaN NaN 14807000
BAL NaN NaN 11560712
BOS NaN NaN 10897560
CAL NaN NaN 14427894
答案 0 :(得分:1)
del
在python中有一个very specific meaning,对这样的数据框没有用处。
你想使用reset_index
摆脱群组之后的MultiIndex
- 如果你想要摆脱MultiIndex
,那就是。
import pandas as pd
salariesData = pd.read_csv('Salaries.csv')
#sum salaries by year and team
sumOfSalaries = (pd.DataFrame(
salariesData.groupby(by=['yearID','teamID'])['salary'].sum()
.reset_index()
))
阅读groupby docs和multiindexing docs了解详情。
答案 1 :(得分:0)
我认为您只需要将参数as_index=False
添加到groupby
,输出为DataFrame
而不会MultiIndex
:
sumOfSalaries = salariesData.groupby(by=['yearID','teamID'], as_index=False)['salary'].sum()
样品:
import pandas as pd
salariesData = pd.DataFrame({
'yearID': {0: 1985, 1: 1985, 2: 1985, 3: 1985, 4: 1985, 5: 1986, 6: 1986, 7: 1986, 8: 1987, 9: 1987},
'teamID': {0: 'ATL', 1: 'ATL', 2: 'ATL', 3: 'CAL', 4: 'CAL', 5: 'CAL', 6: 'CAL', 7: 'BOS', 8: 'BOS', 9: 'BOS'},
'salary': {0: 10, 1: 20, 2: 30, 3: 40, 4: 50, 5: 10, 6: 20, 7: 30, 8: 40, 9: 50}
},
columns = ['yearID','teamID','salary']
)
print (salariesData)
yearID teamID salary
0 1985 ATL 10
1 1985 ATL 20
2 1985 ATL 30
3 1985 CAL 40
4 1985 CAL 50
5 1986 CAL 10
6 1986 CAL 20
7 1986 BOS 30
8 1987 BOS 40
9 1987 BOS 50
sumOfSalaries = salariesData.groupby(by=['yearID','teamID'], as_index=False)['salary'].sum()
print (sumOfSalaries)
yearID teamID salary
0 1985 ATL 60
1 1985 CAL 90
2 1986 BOS 30
3 1986 CAL 30
4 1987 BOS 90
此外,如果需要删除索引名称,请使用分配给(None, None)
,但如果使用上述解决方案,则没有必要:
sumOfSalaries.index.names = (None, None)
样品:
sumOfSalaries = salariesData.groupby(by=['yearID','teamID'])['salary'].sum()
sumOfSalaries.index.names = (None, None)
print (sumOfSalaries)
1985 ATL 60
CAL 90
1986 BOS 30
CAL 30
1987 BOS 90
Name: salary, dtype: int64