Question

我试图计算数据框中包含来自wordlist的字词的产品数量，然后查找这些产品的平均价格。以下尝试 -

for word in wordlist:
    total_count += dframe.Product.str.contains(word, case=False).sum()
    total_price += dframe[dframe['Product'].str.contains(word)]['Price']
    print(dframe[dframe['Product'].str.contains(word)]['Price'])
average_price = total_price / total_count

将average_price作为Series([], Name: Price, dtype: float64)返回，而不是预期的浮点值。

我做错了什么？

谢谢！

Answer 1

标量值的每个条件都需要sum列Price：

total_count, total_price = 0, 0
for word in wordlist:
    total_count += dframe.Product.str.contains(word, case=False).sum()
    total_price += dframe.loc[dframe['Product'].str.contains(word), 'Price'].sum()
average_price = total_price / total_count

或chache mask变量以获得更好的可读性和性能：

total_count, total_price = 0, 0
for word in wordlist:
    mask = dframe.Product.str.contains(word, case=False)
    total_count += mask.sum()
    total_price += dframe.loc[mask, 'Price'].sum()

average_price = total_price / total_count

解决方案应该使用正则表达式进行简化word1|word2|word3 - |表示or：

mask = dframe.Product.str.contains('|'.join(wordlist), case=False)
total_count = mask.sum()
total_price = dframe.loc[mask, 'Price'].sum()

average_price = total_price / total_count

mask = dframe.Product.str.contains('|'.join(wordlist), case=False)
average_price = dframe.loc[mask, 'Price'].mean()

<强>示例：

dframe = pd.DataFrame({
    'Product': ['a1','a2','a3','c1','c1','b','b2','c3','d2'],
    'Price': [1,3,5,6,3,2,3,5,2]
})
print (dframe)
   Price Product
0      1      a1
1      3      a2
2      5      a3
3      6      c1
4      3      c1
5      2       b
6      3      b2
7      5      c3
8      2      d2

wordlist = ['b','c']
mask = dframe.Product.str.contains('|'.join(wordlist), case=False)
average_price = dframe.loc[mask, 'Price'].mean()
print (average_price)
3.8

Answer 2

您可以使用值函数以避免系列。

total_count + = dframe.Product.str.contains（word，case = False）.value.sum（）

total_price + = dframe [dframe ['Product']。str.contains（word）] ['Price']。value

Python - 查找给定另一列中的匹配字符串的列的平均值

2 个答案: