根据两个条件合并数据

时间:2017-06-20 15:31:09

标签: python csv consolidation

我有四列数据,我试图根据两个条件进行整合。数据格式如下:

CountyName  Year    Oil Gas
ANDERSON    2010    1358    0
ANDERSON    2010    621746  4996766
ANDERSON    2011    1587    0
ANDERSON    2011    633120  5020877
ANDERSON    2012    55992   387685
ANDERSON    2012    1342    0
ANDERSON    2013    635572  3036578
ANDERSON    2013    4873    0
ANDERSON    2014    656440  2690333
ANDERSON    2014    12332   0
ANDERSON    2015    608454  2836272
ANDERSON    2015    23339   0
ANDERSON    2016    551728  2682261
ANDERSON    2016    12716   0
ANDERSON    2017    132466  567874
ANDERSON    2017    1709    0
ANDREWS 2010    25701725    1860063
ANDREWS 2010    106351  0
ANDREWS 2011    97772   0
ANDREWS 2011    28818329    1377865
ANDREWS 2012    105062  0
...

我有兴趣将重复的条目的相应油和气体值组合在一起。例如,我想在2010年为安德森县添加所有石油条目,并将该值替换为一行中的现有条目。我现在使用的代码是对各个县的所有值进行求和,不管年份如何,给我一个浓缩的输出:

CountyName  Year    Oil Gas
ANDERSON        3954774 
ANDREWS      206472698  
...

这是我正在使用的代码:

import csv
with open('Texas.csv', 'r') as Texas: #opening Texas csv file
    TexasReader = csv.reader(Texas)
    counties = {}
    years = {}

    index = 0 and 1
    for row in TexasReader:
        if index == 0 and 1:
            header = row
        else:
            county = row[0]
            year = row[1]
            oil = row[2]
            gas = row[3]

            if county in counties: 
                counties[county] += int(oil) 
       else:
               counties[county] = int(oil)
        index += 1

    with open('TexasConsolidated.csv', 'w') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=header, delimiter=',', lineterminator='\n')
        writer.writeheader()
        for k, v in counties.items():
            writer.writerow({header[0]: k, header[2]: v})

1 个答案:

答案 0 :(得分:0)

这就是你抱怨的行:

if county in counties: 
    counties[county] += int(oil) 

如果您希望dict存储两个密钥的总和,则两个值都需要位于dict密钥中。

添加行

counties_years = {}

然后总结如下,使用元组( , )作为关键:

if (county,year) in counties_years: 
    counties_years[(county,year)] += int(oil) 
else:
    counties_years[(county,year)] = int(oil)