数据帧在python中为空?

时间:2015-12-01 18:52:53

标签: python pandas dataframe

我正在尝试使用pandas将日志文件加载到数据框中。我有2个文件,我尝试合并到1.发生的事情是数据帧变空,这很奇怪,因为相同的代码与其他相同类型的日志文件。

以下是我得到的输出:

rows of df1 146299.000000
columns of df1 6.000000
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
Empty DataFrame

它表示正确的行数和列数,但不会在内部提供数据,是什么时候发生的?这是代码和数据样本。

代码:

trace_path = '/Users/ramapriyasridharan/Documents/new_exp/new_trace/m3xlarge/01'

    client_path = os.path.join(trace_path,'client')
    middleware_path = os.path.join(trace_path,'middleware')
    df = pd.DataFrame(columns=['timestamp','type','wait_at_db_queue','db_response_time','wait_server_queue','server_response_time'])
    #df = None
    for root, _,files in os.walk(middleware_path):
        for f in files:
            if 'server' not in f : continue
            print 'current file name %s:' %f

            #df.columns = ['timestamp','type','wait_at_db_queue','db_response_time','wait_server_queue','server_response_time']
            f1 = os.path.join(middleware_path,f)
            df1 = pd.read_csv(f1,header=None,sep=',')
            df1.columns = ['timestamp','type','wait_at_db_queue','db_response_time','wait_server_queue','server_response_time']
            #df1 = refine(df1)
            print ' rows of df1 %f' %df1.shape[0]
            print 'columns of df1 %f'%df1.shape[1]
            print 'len of df1 %f' %len(df1)
            df1 = refine(df1)
            print df1
            if df.shape[0] == 0:
                df = df1
                print df
            else:
                df = pd.concat([df,df1],axis=0)
                print df
    print df
    print ' rows of df %f' %df.shape[0]
    print 'columns of df %f'%df.shape[1]

完整输出:

 python find_service_time.py 
current file name rsridhar-serverworker-1448992797827.log:
 rows of df1 146299.000000
columns of df1 6.000000
len of df1 146299.000000
Empty DataFrame
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
Empty DataFrame
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
current file name rsridhar-serverworker-1448992805710.log:
 rows of df1 194827.000000
columns of df1 6.000000
len of df1 194827.000000
Empty DataFrame
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
Empty DataFrame
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
Empty DataFrame
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
 rows of df 0.000000
columns of df 6.000000
 len of refined df 0.000000
min timestamp : nan
done
Traceback (most recent call last):
  File "find_service_time.py", line 170, in <module>
    main()
  File "find_service_time.py", line 94, in main
    t_per_sec = map(lambda x: len(df[df['timestamp']==x]), range(1,int(np.max(df['timestamp']))))
ValueError: cannot convert float NaN to integer

示例数据:

1448992805978,GET_QUEUE,1,2,0,2
1448992805978,SEND_MSG,18,147,1,157
1448992805978,SEND_MSG,26,153,0,159
1448992805979,SEND_MSG,20,149,1,163
1448992805979,GET_QUEUE,1,3,1,4
1448992805980,GET_QUEUE,1,3,0,3
1448992805981,GET_QUEUE,2,3,1,4
1448992805981,GET_QUEUE,1,3,1,4
1448992805982,SEND_MSG,5,129,0,133
1448992805983,GET_QUEUE,1,8,0,8
1448992805983,GET_QUEUE,3,5,1,6
1448992805983,GET_QUEUE,0,1,5,6
1448992805984,GET_QUEUE,3,5,2,7
1448992805984,GET_QUEUE,2,5,1,7
1448992805985,GET_QUEUE,0,5,3,8
1448992805985,GET_QUEUE,5,10,0,10
1448992805986,GET_QUEUE,4,9,1,10
1448992805986,GET_QUEUE,9,10,0,10
1448992805987,GET_QUEUE,0,7,3,10
1448992805987,GET_QUEUE,4,5,5,10
1448992805988,GET_QUEUE,5,6,5,11
1448992805989,GET_QUEUE,2,6,6,12
1448992805990,GET_QUEUE,1,4,7,11
1448992805990,GET_QUEUE,0,2,8,10
1448992805991,GET_QUEUE,5,10,4,14
1448992805991,GET_QUEUE,2,4,8,12
1448992805991,GET_QUEUE,0,6,7,13
1448992805992,GET_QUEUE,11,16,0,16
1448992805992,GET_QUEUE,0,4,9,13
1448992805993,GET_QUEUE,4,6,8,14
1448992805992,GET_QUEUE,8,15,0,15
1448992805993,GET_QUEUE,1,7,8,15
1448992805993,GET_QUEUE,1,7,8,15
1448992805993,GET_QUEUE,0,10,6,16
1448992805993,GET_QUEUE,6,9,7,16
1448992805994,GET_QUEUE,1,6,8,14
1448992805994,GET_LATEST_MSG_DELETE,1,8,7,15
1448992805995,GET_QUEUE,2,7,9,16
1448992805995,GET_QUEUE,4,6,6,12
1448992805996,GET_QUEUE,10,20,0,20
1448992805996,GET_QUEUE,12,13,6,19

欢迎任何建议,这只是一段代码。

1 个答案:

答案 0 :(得分:1)

refine()未从您的DataFrame中删除某些行;它正在删除所有这些。调用后你有一个print df1,每次输出显示Empty DataFrame。最直接的问题似乎在于你在那里进行的任何过滤。