当我添加字符串列时,使用pct_change时会得到不受支持的操作数类型?

时间:2019-07-03 05:49:21

标签: python pandas dataframe

样本数据集如下。我不确定为什么代码在这里不起作用:

import pandas as pd
w = pd.Series(['BAIN', 'BAIN', 'BAIN', 'KPMG', 'KPMG', 'KPMG', 'EY', 'EY', 'EY' ])
x = pd.Series([2020,2019,2018,2020,2019,2018,2020,2019,2018])
y = pd.Series([10000, 10000, 20000, 25000, 50000, 10000, 100000, 50500, 120000])
z = pd.Series([100000, 500000, 1000000, 50000, 100000, 40000, 1000, 500, 4000])
# aa = pd.Series(['Data', 'Data', 'Data', 'Legal', 'Legal', 'Legal', 'Finance', 'Finance', 'Finance'])
# df = pd.DataFrame({'consultant': w, 'fiscal_year':x, 'budgeted_cost':y, 'actual_cost':z, 'department':aa})
df = pd.DataFrame({'consultant': w, 'fiscal_year':x, 'budgeted_cost':y, 'actual_cost':z})

indexer_consultant_fy = ['consultant', 'fiscal_year']
df = df.set_index(indexer_consultant_fy).sort_index(ascending=True)
df['budgeted_percent_change_by_year'] = df.groupby(level=['consultant'])['budgeted_cost'].pct_change(fill_method='ffill')
df['actual_percent_change_by_year'] = df.groupby(level=['consultant'])['actual_cost'].pct_change(fill_method='ffill')
df = df.sort_values(by = ['consultant', 'fiscal_year'], ascending=False)
df['actual_budget_pct_diff'] = df.pct_change(axis='columns',fill_method='ffill')['actual_cost']

但是,当我再添加一个包含名为Department的字符串的列时。这是行不通的。我得到TypeError说:

  

TypeError:/的不支持的操作数类型:“ str”和“ int”

这是无效的代码示例:

import pandas as pd
w = pd.Series(['BAIN', 'BAIN', 'BAIN', 'KPMG', 'KPMG', 'KPMG', 'EY', 'EY', 'EY' ])
x = pd.Series([2020,2019,2018,2020,2019,2018,2020,2019,2018])
y = pd.Series([10000, 10000, 20000, 25000, 50000, 10000, 100000, 50500, 120000])
z = pd.Series([100000, 500000, 1000000, 50000, 100000, 40000, 1000, 500, 4000])
aa = pd.Series(['Data', 'Data', 'Data', 'Legal', 'Legal', 'Legal', 'Finance', 'Finance', 'Finance'])
df = pd.DataFrame({'consultant': w, 'fiscal_year':x, 'budgeted_cost':y, 'actual_cost':z, 'department':aa})

indexer_consultant_fy = ['consultant', 'fiscal_year']
df = df.set_index(indexer_consultant_fy).sort_index(ascending=True)
df['budgeted_percent_change_by_year'] = df.groupby(level=['consultant'])['budgeted_cost'].pct_change(fill_method='ffill')
df['actual_percent_change_by_year'] = df.groupby(level=['consultant'])['actual_cost'].pct_change(fill_method='ffill')
df = df.sort_values(by = ['consultant', 'fiscal_year'], ascending=False)
df['actual_budget_pct_diff'] = df.pct_change(axis='columns',fill_method='ffill')['actual_cost']

1 个答案:

答案 0 :(得分:0)

问题是,当您添加部门列时,最后一行试图获取所有列(包括部门)的变化百分比。因此,它试图使用(Data - 100000) / 100000或类似的东西,这没有任何意义。之前在顾问列中没有发生这种情况,因为您将其设置为索引,因此将其忽略。

假设您只是尝试获取actual_cost列的百分比变化,请将最后一行更改为:

df['actual_budget_pct_diff'] = df['actual_cost'].pct_change(fill_method='ffill')