Question

我有以下数据框：

class ExampleSpider(BaseSpider):
   name = "example"
   start_urls = ["file:///path_of_directory/example.html"]

   def parse(self, response):
       print response
       hxs = HtmlXPathSelector(response)

我尝试过：

print(inventory_df)

dt_op        Prod_1  Prod_2 ... Prod_n
10/09/18       0        8         0
10/09/18       5        0         2

11/09/18       4        0         0
11/09/18       0       10         0

...

And I would like to get:

print(final_df)

dt_op        Prod_1  Prod_2 ... Prod_n
10/09/18       5        8         2     
11/09/18       4       10         0 

...

但是不会产生所需的输出。如何创建final_df？

Answer 1

您可以将pandas groupby函数与sum()一起使用：

In [412]: inventory_df
Out[412]: 
      dt_op  Prod_1  Prod_2
0  10/09/18       0       8
1  10/09/18       5       0
2  11/09/18       4       0
3  11/09/18       0      10

In [413]: inventory_df.groupby('dt_op').sum()
Out[413]: 
          Prod_1  Prod_2
dt_op                   
10/09/18       5       8
11/09/18       4      10

Answer 2

仅模拟Stated DataFrame，您在各行中询问了groupby + sum()。

复制的数据框：

>>> df
      dt_op  Prod_1  Prod_2  Prod_n
0  10/09/18       0       8       0
1  10/09/18       5       0       2
2  11/09/18       4       0       0

在groupby列周围使用axis=1(of dimension 1, which is what used to be columns)或仅在df.groupby('dt_op').sum周围使用：

>>> df.groupby('dt_op').sum(axis=1)
          Prod_1  Prod_2  Prod_n
dt_op
10/09/18       5       8       2
11/09/18       4       0       0

但是，您正在寻找跨列的行的文字sum（）：

>>> df['new_sum'] = df.sum(axis=1)
>>> df
      dt_op  Prod_1  Prod_2  Prod_n  new_sum
0  10/09/18       0       8       0        8
1  10/09/18       5       0       2        7
2  11/09/18       4       0       0        4

Groupby行和总和

2 个答案: