Python数据框-计算正返回天数

时间:2019-11-20 16:04:12

标签: python pandas

我有一个类似的数据集:

print(portfolio_all[1])
            Date       Open       High  ...      Close  Adj Close    Volume
0     2010-01-04   4.840000   4.940000  ...   4.770000   4.513494   9837300
1     2010-01-05   4.790000   5.370000  ...   5.310000   5.024457  25212000
2     2010-01-06   5.190000   5.380000  ...   5.090000   4.816288  16597900
3     2010-01-07   5.060000   5.430000  ...   5.240000   4.958220  14033400
4     2010-01-08   5.270000   5.430000  ...   5.140000   4.863598  12760000
5     2010-01-11   5.130000   5.230000  ...   5.040000   4.768975  10952900
6     2010-01-12   5.060000   5.150000  ...   5.080000   4.806825   7870300
7     2010-01-13   5.120000   5.500000  ...   5.480000   5.185314  16400500
8     2010-01-14   5.460000   5.710000  ...   5.590000   5.289400  12767100
9     2010-01-15   5.640000   5.840000  ...   5.500000   5.204239  10985300
10    2010-01-19   5.500000   5.730000  ...   5.640000   5.336711   7807700
11    2010-01-20   5.650000   5.890000  ...   5.740000   5.431333  13289100

我想计算多少天有正回报(即Close_day_t> Close_day_t-1)

我尝试了以下功能:

def positive_return_days(portfolio):
    positive_returns = pd.DataFrame(
    columns=['ticker', 'name', 'total positive', 'total days'])
    for asset in portfolio:
        for index, row in asset.iterrows():
            try:
                this_day_close = asset.iloc[[index]]['Close']
                previous_day_close = asset.iloc[[index-1]]['Close']
                asset.loc[index, 'positive_days'] = np.where((this_day_close > previous_day_close))
            except IndexError:
             print("I get out of bounds")
    total_positive_days = asset['positive_days'].sum()
    new_row = {'ticker':asset.name, 'name':asset.name, 'total positive':total_positive_days, 'total days':len(asset.index)}
    positive_returns = positive_returns.append(new_row, ignore_index=True)
    print("Asset: ", "total positive days: ", total_positive_days, "total days:",len(asset.index))
    return positive_returns

但是我遇到一个错误:

ValueError: Can only compare identically-labeled Series objects

我该如何解决?

2 个答案:

答案 0 :(得分:1)

  • 您可以只使用.shift函数将列移动一个值。
import pandas as pd

df = pd.DataFrame({'Close':[1,2,3,2,1,3]})

print(df)
print("count",(df.Close - df.Close.shift(1) > 0).sum())

*输出:

   Close
0   1
1   2
2   3
3   2
4   1
5   3
count:3

答案 1 :(得分:1)

您可以使用pd.Series.diff来计算差异,然后计算正数:

(df['Close'].diff() > 0).sum()