我在批量csv文件中包含以下几行: 日期,id,站点,链接,linkdwon,计数,连接
20190102,100000000204197,google.com,1,2,1,5
20190102,100000000204197,yahoo.com,2,2,1,5
20190102,100000000204197,yahoo.com,1,2,2,3
20190102,41602323232,google.com,4,11,3
20190102,41602323232,google.com,1,3,1,7
基于ID和我要汇总的网站
100000000204197,google.com,1,2,1,5
100000000204197,yahoo.com,3,4,3,8
20190102,41602323232,google.com,5,4,2,10
from datetime import datetime
from dateutil.parser import parse
from collections import Counter
import csv
with open('/home/mahmoudod/Desktop/Tareq-Qassrawi/report.txt','r') as rf:
reader = csv.reader(rf)
with open ('/home/mahmoudod/Desktop/Tareq-Qassrawi/writer.txt','w') as wf:
hashing_table = {}
connection_val= 0
connection_val_2=0
for line in reader:
key = int(line[1])
if key != hashing_table.items():
hashing_table =({'IMSI':key
,'SITE':str(line[2])
,'DATE':str(line[0])
,'linkup' :int(line[3])
,'linkdown':int(line[4])
,'count':int(line[5])
,'connection':int(line[6])
})
connection_val = connection_val + int(hashing_table.get('connection'))
hashing _table[key].update({'connection':connection_val})
else:
connection_val_2 = connection_val_2 + int(hashing_table.get('connection'))
hashing _table[key].update({'connection':connection_val2})
答案 0 :(得分:1)
答案 1 :(得分:1)
在这里
使用http://wesmckinney.com/(令人惊叹的pandas模块)(以及现在的一大批开源#贡献者。请参见http://pandas.pydata.org/pandas-docs/stable/此处的文档
import pandas as pd
df = pd.read_csv('a.csv') # read in your data from the csv file.
df.groupby(['id', 'site']).sum() # groupby here groups your data by both the id and sum.
要使所有ID都显示出来而不是省略重复的ID,我们使用reset_index
df.groupby(['id', 'site']).sum().reset_index()
如果您在生活/职业中经常使用数据,请同时查看jupyter笔记本或jupyter实验室:https://jupyter.org/
祝您好运,欢迎使用SO和python开源数据。