如果在pandas数据框中,我有两列如下:
df.high
Out[11]:
date
2004-01-14 NaN
2004-01-15 1.2675
2004-01-16 1.2609
2004-01-19 1.2426
2004-01-20 NaN
2004-01-21 NaN
2004-01-22 NaN
2004-01-23 1.2778
2004-01-26 1.2616
df.low
Out[12]:
date
2004-01-14 NaN
2004-01-15 1.2558
2004-01-16 1.2349
2004-01-19 1.2334
2004-01-20 NaN
2004-01-21 NaN
2004-01-22 NaN
2004-01-23 1.2564
2004-01-26 1.2457
如何使用df.high中组的第一个值和df.low中最后一个组忽略beetween中的值来为每组值绘制一条直线?
e.g。在这个例子中,第一行必须是从df.high 2004-01-15到df.low 2004-01-19,第二行必须从df.high 01-23到df.low 01-26
除了这个例子,我还有比这更大的数据帧,其中值组与NaN组交替,我需要保持日期时间索引的顺序相同。
答案 0 :(得分:2)
首先,您可以构建一个根据NaN
分割DataFrame的函数:
def mysplit(df):
parts = np.split(df, np.where(np.isnan(df.value))[0])
# removing NaN entries
parts = [part[~np.isnan(part.value)] for part in parts
if not isinstance(part, np.ndarray)]
# removing empty DataFrames
parts = [part for part in parts if not part.empty]
return parts
然后,您可以为您拥有的每个DataFrame运行此函数:
parts1 = mysplit(df1)
#[ date value
#1 2004-01-15 00:00:00 1.2675
#2 2004-01-16 00:00:00 1.2609
#3 2004-01-19 00:00:00 1.2426,
# date value
#7 2004-01-23 00:00:00 1.2778
#8 2004-01-26 00:00:00 1.2616]
parts2 = mysplit(df2)
#[ date value
#1 2004-01-15 00:00:00 1.2558
#2 2004-01-16 00:00:00 1.2349
#3 2004-01-19 00:00:00 1.2334,
# date value
#7 2004-01-23 00:00:00 1.2564
#8 2004-01-26 00:00:00 1.2457]
简化情节:
import matplotlib.pyplot as plt
values = [[i.values[0,1], i.values[-1,1]] for i,j in zip(parts1, parts2)]
for value in values:
plt.plot([0,1], value)
编辑:为了达到您在评论中的建议,您可以稍微改变最后一部分:
for i,j in zip(parts1, parts2):
plt.plot([i.index[0], j.index[-1]], [i.values[0,1], j.values[-1,1]])
plt.show()
,并提供: