我有一个带有时间戳和两列的日志文件。我现在想重新取样和#34; pivot"从日志文件创建的日期框架。
示例orig数据帧/日志文件:
timestamp colA colB
2015-01-01 00:10:01 a x
2014-01-01 00:10:01 b y
2015-01-01 00:10:03 a x
2015-01-01 00:10:03 a x
2015-01-01 00:10:03 a y
2015-01-01 00:10:04 b x
2014-01-01 00:10:04 b y
2014-01-01 00:10:04 b y
2014-01-01 00:10:04 a x
2014-01-01 00:10:05 a x
2014-01-01 00:10:05 a x
2014-01-01 00:10:07 a y
2014-01-01 00:10:08 a x
按秒重新采样的示例结果:
a b
timestamp x y x y
2015-01-01 00:10:01 1 0 0 1
2015-01-01 00:10:02 0 0 0 0
2015-01-01 00:10:03 2 1 0 0
2015-01-01 00:10:04 1 0 1 2
2014-01-01 00:10:05 2 0 0 0
2014-01-01 00:10:06 0 0 0 0
2014-01-01 00:10:07 0 1 0 0
2014-01-01 00:10:08 1 0 0 0
我将如何实现这一目标?首先重新采样,然后是groupby / pivot?或者相反?更具体地说,单元格应该包含每个特定重采样时间间隔的colA / colB组合的计数。在示例秒中,但可能是分钟,小时等。
我不是固定在这种格式上,我也可以考虑获得重新采样的结果和groupby timestamp / colA之类的
colB
timestamp colA x y
2015-01-01 00:10:01 a 1 0
b 0 1
2015-01-01 00:10:02 a 0 0
b 0 0
2015-01-01 00:10:03 a 2 1
b 0 0
2015-01-01 00:10:04 a 1 0
b 1 2
2014-01-01 00:10:05 a 2 0
b 0 0
2014-01-01 00:10:06 a 0 0
b 0 0
2014-01-01 00:10:07 a 0 1
b 0 0
2014-01-01 00:10:08 a 1 0
b 0 0
最终用法是绘制不同的计数值
THX。
答案 0 :(得分:1)
您可以使用pd.crosstab
:
import numpy as np
import pandas as pd
df = pd.read_table('data', sep='\s{2,}', parse_dates=[0])
table = pd.crosstab(index=[df['timestamp']], columns=[df['colA'], df['colB']])
产量
colA a b
colB x y x y
timestamp
2014-01-01 00:10:01 0 0 0 1
2014-01-01 00:10:04 1 0 0 2
2014-01-01 00:10:05 2 0 0 0
2014-01-01 00:10:07 0 1 0 0
2014-01-01 00:10:08 1 0 0 0
2015-01-01 00:10:01 1 0 0 0
2015-01-01 00:10:03 2 1 0 0
2015-01-01 00:10:04 0 0 1 0