Question

我在批量csv文件中包含以下几行：   日期，id，站点，链接，linkdwon，计数，连接

20190102,100000000204197，google.com，1,2,1,5

20190102,100000000204197，yahoo.com，2,2,1,5

20190102,100000000204197，yahoo.com，1,2,2,3

20190102,41602323232，google.com，4,11,3

20190102,41602323232，google.com，1,3,1,7

基于ID和我要汇总的网站

100000000204197，google.com，1,2,1,5

100000000204197，yahoo.com，3,4,3,8

20190102,41602323232，google.com，5,4,2,10

from datetime import datetime
from dateutil.parser import parse
from collections import Counter
import csv
with open('/home/mahmoudod/Desktop/Tareq-Qassrawi/report.txt','r') as rf:
    reader = csv.reader(rf)
    with open ('/home/mahmoudod/Desktop/Tareq-Qassrawi/writer.txt','w') as wf:
        hashing_table = {}
        connection_val= 0
        connection_val_2=0
        for line in reader:
            key = int(line[1])
            if key != hashing_table.items():
                hashing_table =({'IMSI':key
                ,'SITE':str(line[2])
                ,'DATE':str(line[0])
                ,'linkup' :int(line[3])
                ,'linkdown':int(line[4])
                ,'count':int(line[5])
                ,'connection':int(line[6])
                    })
                connection_val = connection_val + int(hashing_table.get('connection'))
                hashing _table[key].update({'connection':connection_val})
            else:
                connection_val_2 = connection_val_2 + int(hashing_table.get('connection'))
                hashing _table[key].update({'connection':connection_val2})

Answer 1

您可以为此使用熊猫的from_csv和to_dict。

Answer 2

在这里

使用http://wesmckinney.com/（令人惊叹的pandas模块）（以及现在的一大批开源＃贡献者。请参见http://pandas.pydata.org/pandas-docs/stable/此处的文档

import pandas as pd 
df = pd.read_csv('a.csv') # read in your data from the csv file.  
df.groupby(['id', 'site']).sum() # groupby here groups your data by both the id and sum.

要使所有ID都显示出来而不是省略重复的ID，我们使用reset_index

df.groupby(['id', 'site']).sum().reset_index()

如果您在生活/职业中经常使用数据，请同时查看jupyter笔记本或jupyter实验室：https://jupyter.org/

祝您好运，欢迎使用SO和python开源数据。

我有大量的csv文件，我想在字典中读取它，然后将字典再次写入新的csv文件中，

2 个答案: