我的CSV文件如下:
CountryCode, NumberCalled, CallPrice, CallDuration
BS,+1234567,0.20250,29
BS,+19876544,0.20250,1
US,+121234,0.01250,4
US,+1543215,0.01250,39
US,+145678,0.01250,11
US,+18765678,None,0
我希望能够分析文件以处理数据中的一些统计信息:
CountryCode, NumberOfTimesCalled, TotalPrice, TotalCallDuration
US, 4, 1.555, 54
目前,我已经设定了这个设置:
CalledStatistics = {}
当我从CSV中读取每一行时,将数据放入dict的最佳方法是什么? :
CalledStatistics['CountryCode'] = {'CallDuration', 'CallPrice', 'NumberOfTimesCalled'}
添加第二条美国线会覆盖第一条线路,还是会根据密钥' CountryCode'添加数据? ?
答案 0 :(得分:2)
这些电话中的每一个:
会覆盖之前的电话。
为了计算你需要的总和,你可以使用dicts的词典。就像你在这些变量中有数据的for循环一样:country_code,call_duration,call_price以及你将数据存储在gather_statistics中的位置:(编辑:添加第一行,以便将call_price转换为0,如果它已被记录在数据中为None;这段代码用于处理一致的数据,如只有整数,如果可能有其他类型的数据,则需要在python之前将它们转换为整数[或任何相同类型的数字]总结他们)
CalledStatistics['CountryCode'] = {'CallDuration', 'CallPrice', 'NumberOfTimesCalled'}
并在循环之后,为每个country_code:
call_price = call_price if call_price != None else 0
if country_code not in collected_statistics:
collected_statistics[country_code] = {'CallDuration' : [call_duration],
'CallPrice' : [call_price]}
else:
collected_statistics[country_code]['CallDuration'] += [call_duration]
collected_statistics[country_code]['CallPrice'] += [call_price]
好的,所以最后这里是一个完整的工作脚本来处理你给出的例子:
number_of_times_called[country_code] = len(collected_statistics[country_code]['CallDuration']
total_call_duration[country_code] = sum(collected_statistics[country_code]['CallDuration'])
total_price[country_code] = sum(collected_statistics[country_code]['CallPrice'])
使用CalledData作为具有您提供的完全相同内容的文件,输出:
#!/usr/bin/env python3
import csv
import decimal
with open('CalledData', newline='') as csvfile:
csv_r = csv.reader(csvfile, delimiter=',', quotechar='|')
# btw this creates a dict, not a set
collected_statistics = {}
for row in csv_r:
[country_code, number_called, call_price, call_duration] = row
# Only to avoid the first line, but would be better to have a list of available
# (and correct) codes, and check if the country_code belongs to this list:
if country_code != 'CountryCode':
call_price = call_price if call_price != 'None' else 0
if country_code not in collected_statistics:
collected_statistics[country_code] = {'CallDuration' : [int(call_duration)],
'CallPrice' : [decimal.Decimal(call_price)]}
else:
collected_statistics[country_code]['CallDuration'] += [int(call_duration)]
collected_statistics[country_code]['CallPrice'] += [decimal.Decimal(call_price)]
for country_code in collected_statistics:
print(str(country_code) + ":")
print("number of times called: " + str(len(collected_statistics[country_code]['CallDuration'])))
print("total price: " + str(sum(collected_statistics[country_code]['CallPrice'])))
print("total call duration: " + str(sum(collected_statistics[country_code]['CallDuration'])))
答案 1 :(得分:0)
字典可以包含列表和字典列表,因此您可以按如下方式实现所需的结构:
CalledStatistics['CountryCode'] =[ {
'CallDuration':cd_val,
'CallPrice':cp_val,
'NumberOfTimesCalled':ntc_val } ]
然后你可以添加这样的值:
for line in lines:
parts = line.split(',')
CalledStatistics[parts.pop(0)].append({
'CallDuration':parts[0],
'CallPrice':parts[1],
'NumberOfTimesCalled':parts[2] })
通过使每个countryCode成为一个列表,您可以为每个countryCode添加任意数量的唯一dicts。
pop(i)
方法返回值并改变列表,这样剩下的就是dict值所需的数据。这就是为什么我们弹出索引0
并将索引0
- 2
添加到词典中。
答案 2 :(得分:0)
Your approach could be slightly different. Just read the file, make it a list (readlines.strip("\n"), split(",").
Forget about the first row and the last (will be empty most likely, test). Then you can make the dict using an example @zezollo used and simply add values by key of the dict you would create. Make sure all the values you are adding, after you make it a list of lists, is the same Type.
Nothing like a hard work, you'll remember that case for long ;)
Test, test, test on mock examples. And read Python help and docs. It's brilliant.