下面的图表显示了我的基本挑战:从数据开始的数量中减去数据结束的数量。我遇到的挑战是每个系列的日期范围不匹配所以我需要将两个集合合并到一个公共日期范围,执行减法,并将结果保存到新的逗号分隔值文件。
名为“meta.csv”的文件中的输入数据包含3187行。每行的字段是股票代码,开始和&的数据。结束。头部和尾部如下所示:
0000 ticker,start,end 0001 A,1999-11-18,2016-12-27 0002 AA,2016-11-01,2016-12-27 0003 AAL,2005-09-27,2016-12-27 0004 AAMC,2012-12-13,2016-12-27 0005 AAN,1984-09-07,2016-12-27 ... 3183 ZNGA,2011-12-16,2016-12-27 3184 ZOES,2014-04-11,2016-12-27 3185 ZQK,1990-03-26,2015-09-09 3186 ZTS,2013-02-01,2016-12-27 3187 ZUMZ,2005-05-06,2016-12-27
Python代码和控制台输出:
import pandas as pd
df = pd.read_csv('meta.csv')
s = df.groupby('start').size().cumsum()
e = df.groupby('end').size().cumsum()
#s.plot(title='NUMBER OF STOCKS WITH DATA START',
# grid=True,style='k.')
#e.plot(title='NUMBER OF STOCKS WITH DATA END',
# grid=True,style='k.')
print(s.head(5))
print(s.tail(5))
print(e.tail(5))
OUT:
start
1962-01-02 11
1962-11-19 12
1970-01-02 30
1971-08-06 31
1972-06-01 54
dtype: int64
start
2016-07-05 3182
2016-10-04 3183
2016-11-01 3184
2016-12-05 3185
2016-12-08 3186
end
2016-12-08 544
2016-12-15 545
2016-12-16 546
2016-12-21 547
2016-12-27 3186
dtype: int64
为上面显示的代码删除注释时的图表输出:
我想创建一个人口档案,其中包含有效数据的股票的日期和数量,其头部和尾部应如下所示:
date,num_stocks 1962-01-02,11 1962-11-19,12 1970-01-02,30 1971-08-06,31 1972-06-01,54 ... 2016-12-08,2642 2016-12-15,2641 2016-12-16,2640 2016-12-21,2639 2016-12-27,2639
最终目标是通过阅读人口档案,能够在任何指定日期范围内绘制包含数据的股票数量。
答案 0 :(得分:1)
将日期与各自的计数对齐。我会采用pd.Series.value_counts
df.start.value_counts().sub(df.end.value_counts(), fill_value=0)
1984-09-07 1.0
1990-03-26 1.0
1999-11-18 1.0
2005-05-06 1.0
2005-09-27 1.0
2011-12-16 1.0
2012-12-13 1.0
2013-02-01 1.0
2014-04-11 1.0
2015-09-09 -1.0
2016-11-01 1.0
2016-12-27 -9.0
dtype: float64
答案 1 :(得分:0)
感谢piRSquared提供的重要提示,我使用此代码解决了挑战:
import pandas as pd
df = pd.read_csv('meta.csv')
x = df.start.value_counts().sub(df.end.value_counts(), fill_value=0)
x.iloc[-1] = 0
r = x.cumsum()
r.to_csv('pop.csv')
z = pd.read_csv('pop.csv', index_col=0, header=None)
z.plot(title='NUMBER OF STOCKS WITH DATA',legend=None,
grid=True,style='k.')
' pop.csv'文件头/尾:
1962-01-02 11.0 1962-11-19 12.0 1970-01-02 30.0 1971-08-06 31.0 1972-06-01 54.0 ... 2016-12-08 2642.0 2016-12-15 2641.0 2016-12-16 2640.0 2016-12-21 2639.0 2016-12-27 2639.0
图表: