Question

我有2个CSV文件。一个有城市名称，人口和湿度。在第二个城市映射到州。我希望获得州总人口和平均湿度。有人可以帮忙吗？这是一个例子：

CSV 1：

CityName,population,humidity
Austin,1000,20
Sanjose,2200,10
Sacramento,500,5

CSV 2：

State,city name 
Ca,Sanjose
Ca,Sacramento
Texas,Austin

想得到输出（总和人口和州的平均湿度）：

Ca,2700,7.5
Texas,1000,20

Answer 1

上述解决方案不起作用，因为字典将包含一个键值。我放弃了，最后用了一个循环。下面的代码工作，提到输入

csv1                                         
       state_name,city_name                 
       CA,sacramento
       utah,saltlake
       CA,san jose
       Utah,provo
       CA,sanfrancisco 
       TX,austin
       TX,dallas
       OR,portland

CSV2
   city_name    population  humidity
    sacramento  1000    1
    saltlake    300 5
    san jose    500 2
       provo    100 7
    sanfrancisco    700 3
    austin  2000    4
    dallas  2500    5
    portland    300 6

def mapping_within_dataframe(self, file1,file2,file3):  
      self.csv1 = file1  
      self.csv2 = file2   
      self.outcsv = file3  
      one_state_data = 0  
      outfile = csv.writer(open('self.outcsv', 'w'), delimiter=',')

      state_city = read_csv(self.csv1)
      city_data = read_csv(self.csv2)

      all_state = list(set(state_city.state_name))

      for one_state in all_state:

           one_state_cities = list(state_city.loc[state_city.state_name == one_state, "city_name"])
           one_state_data = 0

           for one_city in one_state_cities:
                one_city_data = city_data.loc[city_data.city_name == one_city, "population"].sum()
                one_state_data = one_state_data + one_city_data

           print one_state, one_state_data 

           outfile.writerows(whatever)

Answer 2

def output(file1, file2):

    f = lambda x: x.strip()     #strips newline and white space characters

    with open(file1) as cities:
        with open(file2) as states:
            states_dict = {}
            cities_dict = {}

            for line in states:
                line = line.split(',')
                states_dict[f(line[0])] = f(line[1])
            for line in cities:
                line = line.split(',')
                cities_dict[f(line[0])] = (int(f(line[1])) , int(f(line[2])))

    for state , city in states_dict.iteritems():
        try:
            print state, cities_dict[city]
        except KeyError:
            pass

output(CSV1,CSV2)   #these are the names of the files

这提供了您想要的输出。只需确保两个文件中城市的名称在大小写方面相同。

使用Python进行数据分析

2 个答案: