第一部分
我有一个包含财务数据的数据框(33023行,这里是指向的链接 数据:https://mab.to/Ssy3TelRs); df.open是标题的价格和 df.close是收盘价。
我一直试图看到标题关闭了多少次 有收获和迷失。
我正在寻找的结果应该告诉我标题是 连续2天x次,连续3天y次,4天a 排Z次等等。
我开始使用for:
for x in range(1,df.close.count()): y = df.close[x]-df.open[x]
然后一系列if语句失败...
感谢您的帮助。
CronosVirus00
编辑:
>>> df.head(7) data ora open max min close Unnamed: 6 0 20160801 0 1.11781 1.11781 1.11772 1.11773 0 1 20160801 100 1.11774 1.11779 1.11773 1.11777 0 2 20160801 200 1.11779 1.11800 1.11779 1.11795 0 3 20160801 300 1.11794 1.11801 1.11771 1.11771 0 4 20160801 400 1.11766 1.11772 1.11763 1.11772 0 5 20160801 500 1.11774 1.11798 1.11774 1.11796 0 6 20160801 600 1.11796 1.11796 1.11783 1.11783 0
IFS:
for x in range(1,df.close.count()): y = df.close[x]-df.open[x] if y > 0 : green += 1 y = df.close[x+1] - df.close[x+1] twotimes += 1 if y > 0 : green += 1 y = df.close[x+2] -
df.close [x + 2]三次+ = 1如果y> 0: green + = 1 y = df.close [x + 3] - df.close [x + 3] fourtimes + = 1
最终解决方案
谢谢大家!最后我做到了这一点:
df['test'] = df.close - df.open >0 green = df.test #days that it was positive def gg(z): tot =green.count() giorni = range (1,z+1) # days in a row i wanna check for x in giorni: y = (green.rolling(x).sum()>x-1).sum() print(x," ",y, " ", round((y/tot)*100,1),"%") gg(5) 1 14850 45.0 % 2 6647 20.1 % 3 2980 9.0 % 4 1346 4.1 % 5 607 1.8 %
答案 0 :(得分:2)
听起来你想做的是:
diff = df.open - df.close
diff > 0
df[diff > 0]
我需要登机,但我会提供最后一步的示例。
答案 1 :(得分:2)
如果我理解正确,您需要之前连续至少有n
个正天数且包含在内的天数。
与@Thang建议的相似,您可以使用rolling:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(10, 2), columns=["open", "close"])
# This just sets up random test data, for example:
# open close
# 0 0.997986 0.594789
# 1 0.052712 0.401275
# 2 0.895179 0.842259
# 3 0.747268 0.919169
# 4 0.113408 0.253440
# 5 0.199062 0.399003
# 6 0.436424 0.514781
# 7 0.180154 0.235816
# 8 0.750042 0.558278
# 9 0.840404 0.139869
positiveDays = df["close"]-df["open"] > 0
# This will give you a series that is True for positive days:
# 0 False
# 1 True
# 2 False
# 3 True
# 4 True
# 5 True
# 6 True
# 7 True
# 8 False
# 9 False
# dtype: bool
daysToCheck = 3
positiveDays.rolling(daysToCheck).sum()>daysToCheck-1
现在,这将为您提供一个系列,每天都会显示连续几天的daysToCheck
天数是否为正数:
0 False
1 False
2 False
3 False
4 False
5 True
6 True
7 True
8 False
9 False
dtype: bool
现在,您可以使用(positiveDays.rolling(daysToCheck).sum()>daysToCheck-1).sum()
来获取遵守此规则的天数(在示例3
中),这是您所希望的,就我所知。
答案 2 :(得分:2)
如果我理解你的问题你可以这样做:
In [76]: df.groupby((df.close.diff() < 0).cumsum()).cumcount()
Out[76]:
0 0
1 1
2 2
3 0
4 1
5 2
6 0
7 0
dtype: int64
我正在寻找的结果应该告诉我标题是 连续2天x次,连续3天y次,4天a 排Z次等等。
In [114]: df.groupby((df.close.diff() < 0).cumsum()).cumcount().value_counts().to_frame('count')
Out[114]:
count
0 4
2 2
1 2
数据集:
In [78]: df
Out[78]:
data ora open max min close
0 20160801 0 1.11781 1.11781 1.11772 1.11773
1 20160801 100 1.11774 1.11779 1.11773 1.11777
2 20160801 200 1.11779 1.11800 1.11779 1.11795
3 20160801 300 1.11794 1.11801 1.11771 1.11771
4 20160801 400 1.11766 1.11772 1.11763 1.11772
5 20160801 500 1.11774 1.11798 1.11774 1.11796
6 20160801 600 1.11796 1.11796 1.11783 1.11783
7 20160801 700 1.11783 1.11799 1.11783 1.11780
In [80]: df.close.diff()
Out[80]:
0 NaN
1 0.00004
2 0.00018
3 -0.00024
4 0.00001
5 0.00024
6 -0.00013
7 -0.00003
Name: close, dtype: float64
答案 3 :(得分:0)
这应该有效:
import pandas as pd
import numpy as np
test = pd.DataFrame(np.random.randn(100,2), columns = ['open','close'])
test['gain?'] = (test['open']-test['close'] < 0)
test['cumulative'] = 0
for i in test.index[1:]:
if test['gain?'][i]:
test['cumulative'][i] = test['cumulative'][i-1] + 1
test['cumulative'][i-1] = 0
results = test['cumulative'].value_counts()
忽略&#39; 0&#39;排在结果中。如果你想在两次运行中将两天计算为一次运行,它可以毫不费力地进行修改。
编辑:没有警告 -
import pandas as pd
import numpy as np
test = pd.DataFrame(np.random.randn(100,2), columns = ['open','close'])
test['gain?'] = (test['open']-test['close'] < 0)
test['cumulative'] = 0
for i in test.index[1:]:
if test['gain?'][i]:
test.loc[i,'cumulative'] = test.loc[i-1,'cumulative'] + 1
test.loc[i-1,'cumulative'] = 0
results = test['cumulative'].value_counts()