Question

我已将列名称（例如Gross，费用，Net等）的Excel电子表格读入DataFrame。当在结果DataFrame上调用sum方法时，我看到了它不是对费用列求和，因为该行中有几行包含字符串数据。因此，我首先遍历每一行，以测试该列是否包含字符串，如果包含，则将其替换为0。DataFrame sum方法仍然不对费用求和柱。但是，当我将结果DataFrame写到新的Excel电子表格中并读回并将sum方法应用于结果DataFrame时，它确实会汇总 Fee 列。谁能解释一下？这是代码和打印输出：

import pandas as pd

pp = pd.read_excel('pp.xlsx')
# get rid of any strings in column 'Fee':
for i in range(pp.shape[0]):
    if isinstance(pp.loc[i, 'Fee'], str):
        pp.loc[i, 'Fee'] = 0
pd.to_numeric(pp['Fee']) #added this but it makes no difference
# the Fee column is still not summed:
print(pp.sum(numeric_only=True))

print('\nSecond Spreadsheet\n')

# write out Dataframe: to an Excel spreadheet:
with pd.ExcelWriter('pp2.xlsx') as writer:
    pp.to_excel(writer, sheet_name='PP')
# now read the spreadsheet back into another DataFrame:
pp2 = pd.read_excel('pp2.xlsx')
# the Fee column is summed:
print(pp2.sum(numeric_only=True))

打印：

Gross                                                          8677.90
Net                                                            8572.43
Address Status                                                    0.00
Shipping and Handling Amount                                      0.00
Insurance Amount                                                  0.00
Sales Tax                                                         0.00
etc.

Second Spreadsheet

Unnamed: 0                                                     277885.00
Gross                                                            8677.90
Fee                                                              -105.47
Net                                                              8572.43
Address Status                                                      0.00
Shipping and Handling Amount                                        0.00
Insurance Amount                                                    0.00
Sales Tax                                                           0.00
etc.

Answer 1

经过快速分析，我发现您正在用整数替换字符串，并且'Fee'列的值可能是float和integer的混合，这意味着{{1}该列的}是dtype。当您执行object时，由于条件pp.sum(numeric_only=True)，它会忽略对象列。与numeric_only中一样，将您的列转换为float64，它应该对您有用。

第二次发生的原因是因为excel为您进行了数据转换，当您读取数据时，它是一种pp['Fee'] = pd.to_numeric(pp['Fee'])数据类型。

Answer 2

尝试使用pd.to_numeric

例如：

pp = pd.read_excel('pp.xlsx')
print(pd.to_numeric(pp['Fee'], errors='coerce').dropna().sum())

Answer 3

这里的问题是“费用”列不是数字。因此，您需要将其转换为数字字段，将更新后的字段保存在现有数据框中，然后计算总和。

那应该是：

df = df.assign(Fee=pd.to_numeric(df['Fee'], errors='coerce'))
print(df.sum())

Answer 4

所有回复的人都应该因向我介绍pd.to_numeric而获得部分荣誉。但是他们都缺了一件。只说pd.to_numeric(pp['Fee']是不够的。这将返回转换为数值的列，但不会更新原始DataFrame，因此我执行pp.sum()时，pp中的任何内容都没有被修改。您需要：

pp['Fee'] = pd.to_numeric(pp['Fee'])
pp.sum()

熊猫不求和数字列

4 个答案: