我有一个哈希列表,如下所示。
import pandas as pd
import datetime
rows = [{
"version" : "v1",
"timestamp" : "2013-06-04T06:00:00.000Z",
"event" : {
"campaign_id" : "cid2504649263",
"country" : "AU",
"region" : "Cairns",
"impressions" : 3000
}
},
{
"version" : "v1",
"timestamp" : "2013-06-04T06:00:00.000Z",
"event" : {
"campaign_id" : "cid2504649263",
"country" : "AU",
"region" : "Cairns",
"impressions" : 3000
}
},
{
"version" : "v1",
"timestamp" : "2013-06-04T07:00:00.000Z",
"event" : {
"campaign_id" : "cid2504649263",
"country" : "AU",
"region" : "Cairns",
"impressions" : 3000
}
}
]
hash_data = []
for row in rows:
ts = row['timestamp']
meta = row['event']
ts = datetime.datetime.strptime(ts,'%Y-%m-%dT%H:%M:%S.000Z')
meta['utcdt']=ts
hash_data.append(meta)
data = pd.DataFrame(hash_data)
print data.values
grouped = data.groupby(['utcdt','campaign_id','region','country']).sum()
print grouped.values
[['cid2504649263' 'AU' 3000 'Cairns' datetime.datetime(2013, 6, 4, 6, 0)]
['cid2504649263' 'AU' 3000 'Cairns' datetime.datetime(2013, 6, 4, 6, 0)]
['cid2504649263' 'AU' 3000 'Cairns' datetime.datetime(2013, 6, 4, 7, 0)]]
我的问题是这个。我需要按时间汇总数据。数据应如下所示。我如何在熊猫中做到这一点?
[
['cid2504649263' 'AU' 6000 'Cairns' datetime.datetime(2013, 6, 4, 6, 0)]
['cid2504649263' 'AU' 3000 'Cairns' datetime.datetime(2013, 6, 4, 7, 0)]]
如果使用以下内容:
grouped = data.groupby(['utcdt','campaign_id','region','country']).sum()
print grouped.values
[[ 6000.]
[ 3000.]]
答案 0 :(得分:0)
您正在寻找drop_duplicates
:
In [11]: data.drop_duplicates()
Out[11]:
campaign_id country impressions region utcdt
0 cid2504649263 AU 3000 Cairns 2013-06-04 06:00:00
2 cid2504649263 AU 3000 Cairns 2013-06-04 07:00:00
顺便说一句,0.11.1将带有一个实验read_json
函数,它可以直接从json(文件,url或字符串)创建一个DataFrame ......
答案 1 :(得分:0)