Python从CSV文件中添加dict中的多个数据点

时间:2016-03-04 16:32:21

标签: python csv dictionary

我的CSV文件如下:

CountryCode, NumberCalled, CallPrice, CallDuration
BS,+1234567,0.20250,29
BS,+19876544,0.20250,1
US,+121234,0.01250,4
US,+1543215,0.01250,39
US,+145678,0.01250,11
US,+18765678,None,0

我希望能够分析文件以处理数据中的一些统计信息:

CountryCode, NumberOfTimesCalled, TotalPrice, TotalCallDuration
US, 4, 1.555, 54

目前,我已经设定了这个设置:

CalledStatistics = {}

当我从CSV中读取每一行时,将数据放入dict的最佳方法是什么? :

CalledStatistics['CountryCode'] = {'CallDuration', 'CallPrice', 'NumberOfTimesCalled'}

添加第二条美国线会覆盖第一条线路,还是会根据密钥' CountryCode'添加数据? ?

3 个答案:

答案 0 :(得分:2)

这些电话中的每一个:

会覆盖之前的电话。

为了计算你需要的总和,你可以使用dicts的词典。就像你在这些变量中有数据的for循环一样:country_code,call_duration,call_price以及你将数据存储在gather_statistics中的位置:(编辑:添加第一行,以便将call_price转换为0,如果它已被记录在数据中为None;这段代码用于处理一致的数据,如只有整数,如果可能有其他类型的数据,则需要在python之前将它们转换为整数[或任何相同类型的数字]总结他们)

CalledStatistics['CountryCode'] = {'CallDuration', 'CallPrice', 'NumberOfTimesCalled'}

并在循环之后,为每个country_code:

call_price = call_price if call_price != None else 0

if country_code not in collected_statistics:
    collected_statistics[country_code] = {'CallDuration' : [call_duration],
                                          'CallPrice' : [call_price]}
else:
    collected_statistics[country_code]['CallDuration'] += [call_duration]
    collected_statistics[country_code]['CallPrice'] += [call_price]

好的,所以最后这里是一个完整的工作脚本来处理你给出的例子:

number_of_times_called[country_code] = len(collected_statistics[country_code]['CallDuration']

total_call_duration[country_code] = sum(collected_statistics[country_code]['CallDuration'])
total_price[country_code] = sum(collected_statistics[country_code]['CallPrice'])

使用CalledData作为具有您提供的完全相同内容的文件,输出:

#!/usr/bin/env python3

import csv
import decimal

with open('CalledData', newline='') as csvfile:
    csv_r = csv.reader(csvfile, delimiter=',', quotechar='|')

    # btw this creates a dict, not a set
    collected_statistics = {}

    for row in csv_r:

        [country_code, number_called, call_price, call_duration] = row

        # Only to avoid the first line, but would be better to have a list of available
        # (and correct) codes, and check if the country_code belongs to this list:
        if country_code != 'CountryCode':

            call_price = call_price if call_price != 'None' else 0

            if country_code not in collected_statistics:
                collected_statistics[country_code] = {'CallDuration' : [int(call_duration)],
                                                      'CallPrice' : [decimal.Decimal(call_price)]}
            else:
                collected_statistics[country_code]['CallDuration'] += [int(call_duration)]
                collected_statistics[country_code]['CallPrice'] += [decimal.Decimal(call_price)]


    for country_code in collected_statistics:
        print(str(country_code) + ":")
        print("number of times called: " + str(len(collected_statistics[country_code]['CallDuration'])))
        print("total price: " + str(sum(collected_statistics[country_code]['CallPrice'])))
        print("total call duration: " + str(sum(collected_statistics[country_code]['CallDuration'])))

答案 1 :(得分:0)

字典可以包含列表和字典列表,因此您可以按如下方式实现所需的结构:

CalledStatistics['CountryCode'] =[ {
    'CallDuration':cd_val, 
    'CallPrice':cp_val,
    'NumberOfTimesCalled':ntc_val } ]

然后你可以添加这样的值:

for line in lines:
    parts = line.split(',')
    CalledStatistics[parts.pop(0)].append({
        'CallDuration':parts[0], 
        'CallPrice':parts[1],
        'NumberOfTimesCalled':parts[2] })

通过使每个countryCode成为一个列表,您可以为每个countryCode添加任意数量的唯一dicts。

pop(i)方法返回值并改变列表,这样剩下的就是dict值所需的数据。这就是为什么我们弹出索引0并将索引0 - 2添加到词典中。

答案 2 :(得分:0)

Your approach could be slightly different. Just read the file, make it a list (readlines.strip("\n"), split(",").

Forget about the first row and the last (will be empty most likely, test). Then you can make the dict using an example @zezollo used and simply add values by key of the dict you would create. Make sure all the values you are adding, after you make it a list of lists, is the same Type.

Nothing like a hard work, you'll remember that case for long ;)

Test, test, test on mock examples. And read Python help and docs. It's brilliant.