ipython强迫大熊猫策划

时间:2015-05-09 11:55:54

标签: python pandas plot

我有一个循环来为pandas中DF的每一列生成图。我使用Ipython,但是这些图都显示在循环的末尾,而不是在我希望根据我的代码显示它们的地方。

我如何强制ipython / pandas在我有“情节”功能的精确点显示cols?

def explore(file, sep=";", top = 5, k='Code Agence'):
    """

    """
    %matplotlib inline
    import time
    import matplotlib.pyplot as plt
    import pandas as pd
    import time
    import sys
    dataframes_top = []
    start = time.time()
    #print "Exploring :", get_file_name(file), "with %s lines"%(top)

    to_explore = pd.read_csv(file, sep=";", error_bad_lines=False)
    cols = to_explore.columns
    i = -1
    for col in cols:
        i +=1
        serie = to_explore[col]
        try:
            print"plotting %s"%(col)
            serie.plot().show()
            time.sleep(2)
        except Exception as e:
            "plotting issue :%s"%(e)
        #serie.index = index

        null  = serie.isnull()
        not_null = len([x for x in null if not x])
        r = not_null/len(serie)

        s = serie.value_counts()#return value as index, count as value
        pct_top = s.values[:top]/not_null
        serie_top_n = pd.Series(s.values[:top],index=s.index[:top])
        local_df = pd.DataFrame()
        local_df[col]=serie_top_n
        local_df['pct']=pct_top
        somme = local_df['pct'].sum()

        pct_2_top= s.values[:top*2]/not_null
        serie_2_top_n = pd.Series(s.values[:top*2],index=s.index[:top*2])
        local_df_2_top = pd.DataFrame()
        local_df_2_top[col]=serie_2_top_n
        local_df_2_top['pct']=pct_2_top
        somme_2_top = local_df_2_top['pct'].sum()


        print
        print "%s : [col %s = %s ]  "%(get_file_name(file), i,col)  
        print 
        print "%.2f"%(r), " pct not null"
        print "%.2f pct on the first %s "%(somme, top)
        print "%.2f pct on the first %s "%(somme_2_top, 2*top)
        print "plot :"
        print pd.DataFrame(serie.describe()).T

        print
        print local_df.T
        print "plot :"
        local_df.plot()

        print "="*100

        dataframes_top.append(local_df)
    elapsed = time.time()-start
    print "="*20, elapsed, "for %s lines"%(len(serie)),"="*20
    sys.stdout.flush()

1 个答案:

答案 0 :(得分:0)

每次绘制新图表时,请务必致电plt.show()。如果你不这样做,iPython会自动缓冲每个图,并在你到达单元格的末尾时显示它们。我想你在循环结束时忘记这样做了。

这是一些代码的示例,它将在循环中正确绘制图形而不是等到最后:

%matplotlib inline

import matplotlib.pyplot as plt
import random
from pandas import Series
from numpy.random import randn

for i in range(5):
    print("Before graph {0}".format(i))
    ts = Series(randn(1000), index=date_range('1/1/2000', periods=1000))
    ts = ts.cumsum()
    ts.plot()

    plt.show()
    print("After graph {0}".format(i))

如果我运行此选项,则根据需要在打印输出之间显示每个图。

我使用Python 3使用IPython笔记本版本3.0.0-f75fda4。