我到目前为止,EdChum提供了以下代码:
In [1]:
df = pd.DataFrame({'a': [None] * 6, 'b': [2, 3, 10, 3, 5, 8]})
df["c"] =np.NaN
df["c"][0] = 1
df["c"][2] = 3
def func(x):
if pd.notnull(x['c']):
return x['c']
else:
return df.iloc[x.name - 1]['c'] * x['b']
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
df
Out[1]:
a b c
0 None 2 1
1 None 3 3
2 None 10 3
3 None 3 9
4 None 5 45
5 None 8 360
这也很有效,但只要我更改dateframe = df的索引如下:
rng = pd.date_range('1/1/2011', periods=6, freq='D')
df = pd.DataFrame({'a': [None] * 6, 'b': [2, 3, 10, 3, 5, 8]},index=rng)
我收到以下错误:TypeError: ("cannot do label indexing on <class 'pandas.tseries.index.DatetimeIndex'> with these indexers [2011-01-01 00:00:00] of <class 'pandas.tslib.Timestamp'>", u'occurred at index 2011-01-02 00:00:00')
这是什么问题?我如何调整代码以使其与da DatetimeIndex一起使用?
答案 0 :(得分:5)
以下工作,不同之处在于我使用get_loc
获取索引中datetime值的整数位置:
In [48]:
rng = pd.date_range('1/1/2011', periods=6, freq='D')
df = pd.DataFrame({'a': [None] * 6, 'b': [2, 3, 10, 3, 5, 8]},index=rng)
df["c"] =np.NaN
df["c"][0] = 1
df["c"][2] = 3
def func(x):
if pd.notnull(x['c']):
return x['c']
else:
return df.iloc[df.index.get_loc(x.name) - 1]['c'] * x['b']
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
df
Out[48]:
a b c
2011-01-01 None 2 1
2011-01-02 None 3 3
2011-01-03 None 10 3
2011-01-04 None 3 9
2011-01-05 None 5 45
2011-01-06 None 8 360
答案 1 :(得分:0)
仅仅是一个FYI,我通过将熊猫更新到最新的稳定版本来解决这个问题。