我有一个类似的数据集:
print(portfolio_all[1])
Date Open High ... Close Adj Close Volume
0 2010-01-04 4.840000 4.940000 ... 4.770000 4.513494 9837300
1 2010-01-05 4.790000 5.370000 ... 5.310000 5.024457 25212000
2 2010-01-06 5.190000 5.380000 ... 5.090000 4.816288 16597900
3 2010-01-07 5.060000 5.430000 ... 5.240000 4.958220 14033400
4 2010-01-08 5.270000 5.430000 ... 5.140000 4.863598 12760000
5 2010-01-11 5.130000 5.230000 ... 5.040000 4.768975 10952900
6 2010-01-12 5.060000 5.150000 ... 5.080000 4.806825 7870300
7 2010-01-13 5.120000 5.500000 ... 5.480000 5.185314 16400500
8 2010-01-14 5.460000 5.710000 ... 5.590000 5.289400 12767100
9 2010-01-15 5.640000 5.840000 ... 5.500000 5.204239 10985300
10 2010-01-19 5.500000 5.730000 ... 5.640000 5.336711 7807700
11 2010-01-20 5.650000 5.890000 ... 5.740000 5.431333 13289100
我想计算多少天有正回报(即Close_day_t> Close_day_t-1)
我尝试了以下功能:
def positive_return_days(portfolio):
positive_returns = pd.DataFrame(
columns=['ticker', 'name', 'total positive', 'total days'])
for asset in portfolio:
for index, row in asset.iterrows():
try:
this_day_close = asset.iloc[[index]]['Close']
previous_day_close = asset.iloc[[index-1]]['Close']
asset.loc[index, 'positive_days'] = np.where((this_day_close > previous_day_close))
except IndexError:
print("I get out of bounds")
total_positive_days = asset['positive_days'].sum()
new_row = {'ticker':asset.name, 'name':asset.name, 'total positive':total_positive_days, 'total days':len(asset.index)}
positive_returns = positive_returns.append(new_row, ignore_index=True)
print("Asset: ", "total positive days: ", total_positive_days, "total days:",len(asset.index))
return positive_returns
但是我遇到一个错误:
ValueError: Can only compare identically-labeled Series objects
我该如何解决?
答案 0 :(得分:1)
.shift
函数将列移动一个值。import pandas as pd
df = pd.DataFrame({'Close':[1,2,3,2,1,3]})
print(df)
print("count",(df.Close - df.Close.shift(1) > 0).sum())
*输出:
Close
0 1
1 2
2 3
3 2
4 1
5 3
count:3
答案 1 :(得分:1)
您可以使用pd.Series.diff
来计算差异,然后计算正数:
(df['Close'].diff() > 0).sum()