我正在尝试使用pandas将日志文件加载到数据框中。我有2个文件,我尝试合并到1.发生的事情是数据帧变空,这很奇怪,因为相同的代码与其他相同类型的日志文件。
以下是我得到的输出:
rows of df1 146299.000000
columns of df1 6.000000
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
Empty DataFrame
它表示正确的行数和列数,但不会在内部提供数据,是什么时候发生的?这是代码和数据样本。
代码:
trace_path = '/Users/ramapriyasridharan/Documents/new_exp/new_trace/m3xlarge/01'
client_path = os.path.join(trace_path,'client')
middleware_path = os.path.join(trace_path,'middleware')
df = pd.DataFrame(columns=['timestamp','type','wait_at_db_queue','db_response_time','wait_server_queue','server_response_time'])
#df = None
for root, _,files in os.walk(middleware_path):
for f in files:
if 'server' not in f : continue
print 'current file name %s:' %f
#df.columns = ['timestamp','type','wait_at_db_queue','db_response_time','wait_server_queue','server_response_time']
f1 = os.path.join(middleware_path,f)
df1 = pd.read_csv(f1,header=None,sep=',')
df1.columns = ['timestamp','type','wait_at_db_queue','db_response_time','wait_server_queue','server_response_time']
#df1 = refine(df1)
print ' rows of df1 %f' %df1.shape[0]
print 'columns of df1 %f'%df1.shape[1]
print 'len of df1 %f' %len(df1)
df1 = refine(df1)
print df1
if df.shape[0] == 0:
df = df1
print df
else:
df = pd.concat([df,df1],axis=0)
print df
print df
print ' rows of df %f' %df.shape[0]
print 'columns of df %f'%df.shape[1]
完整输出:
python find_service_time.py
current file name rsridhar-serverworker-1448992797827.log:
rows of df1 146299.000000
columns of df1 6.000000
len of df1 146299.000000
Empty DataFrame
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
Empty DataFrame
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
current file name rsridhar-serverworker-1448992805710.log:
rows of df1 194827.000000
columns of df1 6.000000
len of df1 194827.000000
Empty DataFrame
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
Empty DataFrame
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
Empty DataFrame
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
rows of df 0.000000
columns of df 6.000000
len of refined df 0.000000
min timestamp : nan
done
Traceback (most recent call last):
File "find_service_time.py", line 170, in <module>
main()
File "find_service_time.py", line 94, in main
t_per_sec = map(lambda x: len(df[df['timestamp']==x]), range(1,int(np.max(df['timestamp']))))
ValueError: cannot convert float NaN to integer
示例数据:
1448992805978,GET_QUEUE,1,2,0,2
1448992805978,SEND_MSG,18,147,1,157
1448992805978,SEND_MSG,26,153,0,159
1448992805979,SEND_MSG,20,149,1,163
1448992805979,GET_QUEUE,1,3,1,4
1448992805980,GET_QUEUE,1,3,0,3
1448992805981,GET_QUEUE,2,3,1,4
1448992805981,GET_QUEUE,1,3,1,4
1448992805982,SEND_MSG,5,129,0,133
1448992805983,GET_QUEUE,1,8,0,8
1448992805983,GET_QUEUE,3,5,1,6
1448992805983,GET_QUEUE,0,1,5,6
1448992805984,GET_QUEUE,3,5,2,7
1448992805984,GET_QUEUE,2,5,1,7
1448992805985,GET_QUEUE,0,5,3,8
1448992805985,GET_QUEUE,5,10,0,10
1448992805986,GET_QUEUE,4,9,1,10
1448992805986,GET_QUEUE,9,10,0,10
1448992805987,GET_QUEUE,0,7,3,10
1448992805987,GET_QUEUE,4,5,5,10
1448992805988,GET_QUEUE,5,6,5,11
1448992805989,GET_QUEUE,2,6,6,12
1448992805990,GET_QUEUE,1,4,7,11
1448992805990,GET_QUEUE,0,2,8,10
1448992805991,GET_QUEUE,5,10,4,14
1448992805991,GET_QUEUE,2,4,8,12
1448992805991,GET_QUEUE,0,6,7,13
1448992805992,GET_QUEUE,11,16,0,16
1448992805992,GET_QUEUE,0,4,9,13
1448992805993,GET_QUEUE,4,6,8,14
1448992805992,GET_QUEUE,8,15,0,15
1448992805993,GET_QUEUE,1,7,8,15
1448992805993,GET_QUEUE,1,7,8,15
1448992805993,GET_QUEUE,0,10,6,16
1448992805993,GET_QUEUE,6,9,7,16
1448992805994,GET_QUEUE,1,6,8,14
1448992805994,GET_LATEST_MSG_DELETE,1,8,7,15
1448992805995,GET_QUEUE,2,7,9,16
1448992805995,GET_QUEUE,4,6,6,12
1448992805996,GET_QUEUE,10,20,0,20
1448992805996,GET_QUEUE,12,13,6,19
欢迎任何建议,这只是一段代码。
答案 0 :(得分:1)
refine()
未从您的DataFrame中删除某些行;它正在删除所有这些。调用后你有一个print df1
,每次输出显示Empty DataFrame
。最直接的问题似乎在于你在那里进行的任何过滤。