我有一个包含州和城镇名称的multiIndex的数据框。列是通过PeriodIndex创建的季度住房数据。我想在新列中创建数据的比率:
housing_data_compact_df['P Ratio'] = housing_data_compact_df[pd.Period(anal_start_col_name)].div(housing_data_compact_df[pd.Period(anal_end_col_name)])
每当我尝试创建这个新列时,我都会收到错误:
DateParseError: Unknown datetime string format, unable to parse: P Ratio
完整代码:
# Create housing cost dataframe
zillow_file = 'City_Zhvi_AllHomes.csv' #from https://www.zillow.com/research/data/
zillow_df = pd.read_csv(zillow_file,header=0,usecols=1,2,*range(51,251)],index_col=[1,0]).dropna(how='all')
# rename state abbreviations in level 0 multiindex to full state name
zillow_df.reset_index(inplace=True)
zillow_df['State'] = zillow_df['State'].map(states)
zillow_df.set_index(['State','RegionName'], inplace=True)
housing_data_df = zillow_df.groupby(pd.PeriodIndex(zillow_df.columns, freq="Q"), axis=1).mean()
rec_start = '2000Q1'
rec_bottom = '2001Q1'
#Reduce Size to desired data
start_col = housing_data_df.columns.get_loc(pd.Period(rec_start))-1
end_col = housing_data_df.columns.get_loc(pd.Period(rec_bottom))
housing_data_compact_df = housing_data_df[[start_col,end_col]]
#This is where the issue occurs
housing_data_compact_df['P Ratio'] = housing_data_compact_df[pd.Period(anal_start_col_name)].div(housing_data_compact_df[pd.Period(anal_end_col_name)])
以下是一些可能/可能没有帮助的其他数据:
[In]: print(housing_data_compact_df.head())
2000Q1 2001Q1
State RegionName
New York New York 503933.333333 465833.333333
California Los Angeles 502000.000000 413633.333333
Illinois Chicago 237966.666667 219633.333333
Pennsylvania Philadelphia 118233.333333 116166.666667
Arizona Phoenix 205300.000000 168200.000000
[In]: print("Indices: " + str(housing_data_compact_df.index.names))
Indices: ['State', 'RegionName']
[In]: print(housing_data_compact_df.columns)
PeriodIndex(['2000Q1', '2001Q1'], dtype='period[Q-DEC]', freq='Q-DEC')
我尝试过的事情:
似乎我的问题与PeriodIndex列有关。我尝试通过直接转换来转换数据:
[In]: housing_data_compact_df['P Ratio'] = float(housing_data_compact_df[pd.Period(start_col_name)]).div(float(housing_data_compact_df[pd.Period(end_col_name)]))
TypeError: cannot convert the series to <class 'float'>
我还尝试使用.astype()
,但是我得到了与没有转换时相同的错误:
[In]: housing_data_compact_df['P Ratio'] = housing_data_compact_df[pd.Period(start_col_name)].astype(float).div(housing_data_compact_df[pd.Period(end_col_name)].astype(float))
DateParseError: Unknown datetime string format, unable to parse: P Ratio
我还重置了密钥以试图打破PeriodIndex,然后在操作完成后重新索引。然而,这似乎并不适用于我测试它的所有系统,并且似乎是一种迂回的方式来修复我认为应该是一个简单的解决方案。
问题:
如何创建新列作为这些PeriodIndex列的数据比例?
提前感谢您的帮助。
答案 0 :(得分:1)
您需要strftime
才能将string
转换为housing_data_compact_df.columns = housing_data_compact_df.columns.strftime('%YQ%q')
并添加copy
:
zillow_file = 'http://files.zillowstatic.com/research/public/City/City_Zhvi_AllHomes.csv'
zillow_df = pd.read_csv(zillow_file,header=0,
usecols=[1,2] + list(range(51,251)), #changed for python 3
index_col=[1,0]).dropna(how='all')
# rename state abbreviations in level 0 multiindex to full state name
zillow_df.reset_index(inplace=True)
#no states in question, so commented
#zillow_df['State'] = zillow_df['State'].map(states)
zillow_df.set_index(['State','RegionName'], inplace=True)
housing_data_df=zillow_df.groupby(pd.PeriodIndex(zillow_df.columns, freq="Q"), axis=1).mean()
rec_start = '2000Q1'
rec_bottom = '2001Q1'
#Reduce Size to desired data
start_col = housing_data_df.columns.get_loc(pd.Period(rec_start))-1
end_col = housing_data_df.columns.get_loc(pd.Period(rec_bottom))
所有代码:(仅为我工作的小改动,使用你的代码(很好;)))
#add copy
#http://stackoverflow.com/q/42438987/2901002
housing_data_compact_df = housing_data_df[[start_col,end_col]].copy()
print (housing_data_compact_df.head())
2016Q3 2001Q1
State RegionName
NY New York 599850.0 NaN
CA Los Angeles 588750.0 233000.000000
IL Chicago 207600.0 156933.333333
PA Philadelphia 129950.0 55333.333333
AZ Phoenix 197800.0 119600.000000
anal_start_col_name = '2016Q3'
anal_end_col_name = '2001Q1'
a = housing_data_compact_df[pd.Period(anal_start_col_name)]
.div(housing_data_compact_df[pd.Period(anal_end_col_name)])
housing_data_compact_df.columns = housing_data_compact_df.columns.strftime('%YQ%q')
housing_data_compact_df['P Ratio'] = a
print (housing_data_compact_df.head())
2016Q3 2001Q1 P Ratio
State RegionName
NY New York 599850.0 NaN NaN
CA Los Angeles 588750.0 233000.000000 2.526824
IL Chicago 207600.0 156933.333333 1.322855
PA Philadelphia 129950.0 55333.333333 2.348494
AZ Phoenix 197800.0 119600.000000 1.653846
housing_data_compact_df = housing_data_df[[start_col,end_col]].copy()
print (housing_data_compact_df.head())
anal_start_col_name = '2016Q3'
anal_end_col_name = '2001Q1'
housing_data_compact_df.columns = housing_data_compact_df.columns.strftime('%YQ%q')
housing_data_compact_df['P Ratio'] = housing_data_compact_df[anal_start_col_name]
.div(housing_data_compact_df[anal_end_col_name])
print (housing_data_compact_df.head())
2016Q3 2001Q1 P Ratio
State RegionName
NY New York 599850.0 NaN NaN
CA Los Angeles 588750.0 233000.000000 2.526824
IL Chicago 207600.0 156933.333333 1.322855
PA Philadelphia 129950.0 55333.333333 2.348494
AZ Phoenix 197800.0 119600.000000 1.653846
另一种可能的解决方案是:
[y for x in [old_list[slice(*a)] for a in ((0,1),(3,201),(201,None,3))] for y in x]