Question

我需要比较两个Excel并求和所有具有相同键值的实际值。

example sheet.

sheet 1                  | sheet 2

index  id  count         | index    id   name

  1    a     12          |   1       a     qg1

  2    b     15          |   2       c     ff2

  3    c     21          |   3       f     dv1

  4    b      5          |   4       b     bm5
       .                             .
       .                             .

在上述情况下，我引用了sheet2并求和了sheet1中具有相同ID的值的实际值（计数）。（id a | 100，id b | 20 ...）

下面的代码花费了太长时间，因为每个ID都已编入索引。

import pandas as pd
import csv


pcode_quantity = pd.read_csv('/1.csv',delimiter=',')

product_info = pd.read_csv('/2.csv' , delimiter=',')

product_list = product_info.id.tolist()

purchase_id = pcode_quantity.id.tolist()

purchase_count = pcode_quantity['count'].tolist()

product_sum = 0

i =0

i2 = 0

product_lenth =len(product_list)

purchase_lenth = len(purchase_id)

dict_pcode = {}

while product_lenth > i:

    while purchase_lenth > i2:
        if product_list[i] == purchase_id[i2]:
            product_sum = product_sum + purchase_count[i2]
        i2=i2+1
    dict_pcode[product_list[i]]=product_sum
    product_sum = 0        
    i2= 0
    i= i+1

sum_pcode = pd.DataFrame(list(dict_pcode.items()))

sum_pcode.to_csv('/output.csv')

是否有任何代码可以加快上述操作的速度？

Answer 1

您可以先将sum乘以groupby，然后再聚合join product_info，再用DataFrame.fillna替换可能的缺失值，最后将其用于字典set_index通过astype和最后一个to_dict转换为整数：

pcode_quantity = pcode_quantity.groupby('id')['count'].sum()
df = product_info.join(pcode_quantity, on='id').fillna({'count': 0})
print (df)
      id name  count
index               
1      a  qg1   12.0
2      c  ff2   21.0
3      f  dv1    0.0
4      b  bm5   20.0

dict_pcode = df.set_index('id')['count'].astype(int).to_dict()
print (dict_pcode)
{'a': 12, 'c': 21, 'f': 0, 'b': 20}

有没有一种方法可以使用Pandas Python将excel中一行中的所有值与相同的键值相加？

1 个答案: