Question

第一部分

我有一个包含财务数据的数据框（33023行，这里是指向的链接   数据：https://mab.to/Ssy3TelRs）; df.open是标题的价格和   df.close是收盘价。

我一直试图看到标题关闭了多少次   有收获和迷失。

我正在寻找的结果应该告诉我标题是   连续2天x次，连续3天y次，4天a   排Z次等等。

我开始使用for：
for x in range(1,df.close.count()):   y = df.close[x]-df.open[x]
然后一系列if语句失败...

感谢您的帮助。

CronosVirus00

编辑：
>>> df.head(7)
       data  ora     open      max      min    close  Unnamed: 6
0  20160801    0  1.11781  1.11781  1.11772  1.11773           0
1  20160801  100  1.11774  1.11779  1.11773  1.11777           0
2  20160801  200  1.11779  1.11800  1.11779  1.11795           0
3  20160801  300  1.11794  1.11801  1.11771  1.11771           0
4  20160801  400  1.11766  1.11772  1.11763  1.11772           0
5  20160801  500  1.11774  1.11798  1.11774  1.11796           0
6  20160801  600  1.11796  1.11796  1.11783  1.11783           0
IFS：
for x in range(1,df.close.count()):   y = df.close[x]-df.open[x]  if y > 0 :      green += 1      y = df.close[x+1] - df.close[x+1]
  twotimes += 1       if y > 0 :          green += 1          y = df.close[x+2] -
df.close [x + 2]三次+ = 1如果y> 0：                 green + = 1 y = df.close [x + 3] - df.close [x + 3] fourtimes + = 1

最终解决方案

谢谢大家！最后我做到了这一点：
df['test'] = df.close - df.open >0
green = df.test #days that it was positive

def gg(z):
    tot =green.count()
    giorni = range (1,z+1) # days in a row i wanna check
    for x in giorni:
        y = (green.rolling(x).sum()>x-1).sum()
        print(x," ",y, " ", round((y/tot)*100,1),"%")

gg(5)
1   14850   45.0 %
2   6647   20.1 %
3   2980   9.0 %
4   1346   4.1 %
5   607   1.8 %

Answer 1

听起来你想做的是：

计算两个系列（开放和关闭）的差异，例如diff = df.open - df.close
将条件应用于结果以获取布尔系列diff > 0
将生成的布尔序列传递给DataFrame，以获取条件为真的DataFrame的子集df[diff > 0]
通过应用列方式函数来识别和计算

我需要登机，但我会提供最后一步的示例。

Answer 2

如果我理解正确，您需要之前连续至少有n个正天数且包含在内的天数。

与@Thang建议的相似，您可以使用rolling：

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(10, 2), columns=["open", "close"])
# This just sets up random test data, for example:
#       open     close
# 0  0.997986  0.594789
# 1  0.052712  0.401275
# 2  0.895179  0.842259
# 3  0.747268  0.919169
# 4  0.113408  0.253440
# 5  0.199062  0.399003
# 6  0.436424  0.514781
# 7  0.180154  0.235816
# 8  0.750042  0.558278
# 9  0.840404  0.139869

positiveDays = df["close"]-df["open"] > 0
# This will give you a series that is True for positive days:
# 0    False
# 1     True
# 2    False
# 3     True
# 4     True
# 5     True
# 6     True
# 7     True
# 8    False
# 9    False
# dtype: bool

daysToCheck = 3
positiveDays.rolling(daysToCheck).sum()>daysToCheck-1

现在，这将为您提供一个系列，每天都会显示连续几天的daysToCheck天数是否为正数：

0    False
1    False
2    False
3    False
4    False
5     True
6     True
7     True
8    False
9    False
dtype: bool

现在，您可以使用(positiveDays.rolling(daysToCheck).sum()>daysToCheck-1).sum()来获取遵守此规则的天数（在示例3中），这是您所希望的，就我所知。

Answer 3

如果我理解你的问题你可以这样做：

In [76]: df.groupby((df.close.diff() < 0).cumsum()).cumcount()
Out[76]:
0    0
1    1
2    2
3    0
4    1
5    2
6    0
7    0
dtype: int64

我正在寻找的结果应该告诉我标题是连续2天x次，连续3天y次，4天a 排Z次等等。

In [114]: df.groupby((df.close.diff() < 0).cumsum()).cumcount().value_counts().to_frame('count')
Out[114]:
   count
0      4
2      2
1      2

数据集：

In [78]: df
Out[78]:
       data  ora     open      max      min    close
0  20160801    0  1.11781  1.11781  1.11772  1.11773
1  20160801  100  1.11774  1.11779  1.11773  1.11777
2  20160801  200  1.11779  1.11800  1.11779  1.11795
3  20160801  300  1.11794  1.11801  1.11771  1.11771
4  20160801  400  1.11766  1.11772  1.11763  1.11772
5  20160801  500  1.11774  1.11798  1.11774  1.11796
6  20160801  600  1.11796  1.11796  1.11783  1.11783
7  20160801  700  1.11783  1.11799  1.11783  1.11780

In [80]: df.close.diff()
Out[80]:
0        NaN
1    0.00004
2    0.00018
3   -0.00024
4    0.00001
5    0.00024
6   -0.00013
7   -0.00003
Name: close, dtype: float64

Answer 4

这应该有效：

import pandas as pd
import numpy as np
test = pd.DataFrame(np.random.randn(100,2), columns = ['open','close'])

test['gain?'] = (test['open']-test['close'] < 0)
test['cumulative'] = 0

for i in test.index[1:]:
    if test['gain?'][i]:
        test['cumulative'][i] = test['cumulative'][i-1] + 1
        test['cumulative'][i-1] = 0

results = test['cumulative'].value_counts()

忽略＆＃39; 0＆＃39;排在结果中。如果你想在两次运行中将两天计算为一次运行，它可以毫不费力地进行修改。

编辑：没有警告 -

import pandas as pd
import numpy as np

test = pd.DataFrame(np.random.randn(100,2), columns = ['open','close'])
test['gain?'] = (test['open']-test['close'] < 0)
test['cumulative'] = 0

for i in test.index[1:]:
    if test['gain?'][i]:
        test.loc[i,'cumulative'] = test.loc[i-1,'cumulative'] + 1
        test.loc[i-1,'cumulative'] = 0

results = test['cumulative'].value_counts()

计算总和的结果是正数（或负数）

4 个答案: