我试图计算数据框中包含来自wordlist
的字词的产品数量,然后查找这些产品的平均价格。以下尝试 -
for word in wordlist:
total_count += dframe.Product.str.contains(word, case=False).sum()
total_price += dframe[dframe['Product'].str.contains(word)]['Price']
print(dframe[dframe['Product'].str.contains(word)]['Price'])
average_price = total_price / total_count
将average_price
作为Series([], Name: Price, dtype: float64)
返回,而不是预期的浮点值。
我做错了什么?
谢谢!
答案 0 :(得分:2)
标量值的每个条件都需要sum
列Price
:
total_count, total_price = 0, 0
for word in wordlist:
total_count += dframe.Product.str.contains(word, case=False).sum()
total_price += dframe.loc[dframe['Product'].str.contains(word), 'Price'].sum()
average_price = total_price / total_count
或chache mask
变量以获得更好的可读性和性能:
total_count, total_price = 0, 0
for word in wordlist:
mask = dframe.Product.str.contains(word, case=False)
total_count += mask.sum()
total_price += dframe.loc[mask, 'Price'].sum()
average_price = total_price / total_count
解决方案应该使用正则表达式进行简化word1|word2|word3
- |
表示or
:
mask = dframe.Product.str.contains('|'.join(wordlist), case=False)
total_count = mask.sum()
total_price = dframe.loc[mask, 'Price'].sum()
average_price = total_price / total_count
mask = dframe.Product.str.contains('|'.join(wordlist), case=False)
average_price = dframe.loc[mask, 'Price'].mean()
<强>示例强>:
dframe = pd.DataFrame({
'Product': ['a1','a2','a3','c1','c1','b','b2','c3','d2'],
'Price': [1,3,5,6,3,2,3,5,2]
})
print (dframe)
Price Product
0 1 a1
1 3 a2
2 5 a3
3 6 c1
4 3 c1
5 2 b
6 3 b2
7 5 c3
8 2 d2
wordlist = ['b','c']
mask = dframe.Product.str.contains('|'.join(wordlist), case=False)
average_price = dframe.loc[mask, 'Price'].mean()
print (average_price)
3.8
答案 1 :(得分:1)
您可以使用值函数以避免系列。
total_count + = dframe.Product.str.contains(word,case = False).value.sum()
total_price + = dframe [dframe ['Product']。str.contains(word)] ['Price']。value