熊猫绘制来自不同列的行,忽略值

时间:2014-01-27 18:44:07

标签: python matplotlib plot split pandas

如果在pandas数据框中,我有两列如下:

df.high
Out[11]: 
date
2004-01-14       NaN
2004-01-15    1.2675
2004-01-16    1.2609
2004-01-19    1.2426
2004-01-20       NaN
2004-01-21       NaN
2004-01-22       NaN
2004-01-23    1.2778
2004-01-26    1.2616  

df.low
Out[12]: 
date
2004-01-14       NaN
2004-01-15    1.2558
2004-01-16    1.2349
2004-01-19    1.2334
2004-01-20       NaN
2004-01-21       NaN
2004-01-22       NaN
2004-01-23    1.2564
2004-01-26    1.2457 

如何使用df.high中组的第一个值和df.low中最后一个组忽略beetween中的值来为每组值绘制一条直线?

e.g。在这个例子中,第一行必须是从df.high 2004-01-15到df.low 2004-01-19,第二行必须从df.high 01-23到df.low 01-26

除了这个例子,我还有比这更大的数据帧,其中值组与NaN组交替,我需要保持日期时间索引的顺序相同。

1 个答案:

答案 0 :(得分:2)

首先,您可以构建一个根据NaN分割DataFrame的函数:

def mysplit(df):
    parts = np.split(df, np.where(np.isnan(df.value))[0])
    # removing NaN entries
    parts = [part[~np.isnan(part.value)] for part in parts
              if not isinstance(part, np.ndarray)]
    # removing empty DataFrames
    parts = [part for part in parts if not part.empty]
    return parts

然后,您可以为您拥有的每个DataFrame运行此函数:

parts1 = mysplit(df1)
#[                 date   value
#1 2004-01-15 00:00:00  1.2675
#2 2004-01-16 00:00:00  1.2609
#3 2004-01-19 00:00:00  1.2426,
#                 date   value
#7 2004-01-23 00:00:00  1.2778
#8 2004-01-26 00:00:00  1.2616]

parts2 = mysplit(df2)
#[                 date   value
#1 2004-01-15 00:00:00  1.2558
#2 2004-01-16 00:00:00  1.2349
#3 2004-01-19 00:00:00  1.2334,
#                 date   value
#7 2004-01-23 00:00:00  1.2564
#8 2004-01-26 00:00:00  1.2457]

简化情节:

import matplotlib.pyplot as plt
values = [[i.values[0,1], i.values[-1,1]] for i,j in zip(parts1, parts2)]
for value in values:
    plt.plot([0,1], value)

enter image description here


编辑:为了达到您在评论中的建议,您可以稍微改变最后一部分:

for i,j in zip(parts1, parts2):
    plt.plot([i.index[0], j.index[-1]], [i.values[0,1], j.values[-1,1]])
plt.show()

,并提供:

enter image description here