我有一个数据框,其中包含下面“ product_location”列的数据。仅显示单行值作为参考
df.product_location[0]=[{'product':'christmas-socks-2019','store':'Downtown-A,Montgomery'}, {'product':'easter-socks-2018','store':'Euston'},{'product':'easter-socks-2019','source':'Euston'}]
df.product_location[1]=[{'product':'christmas-mugs-2019','store':'Montgomery'}, {'product':'easter-mugs-2018','store':'Euston, Downtown-B'},{'product':'easter-mugs-2019','source':'High-Street'}]
df.product_location[2]=[{'product':'christmas-card-2019','store':'Downtown-A, Montgomery'}, {'product':'easter-card-2018','store':'Euston'},{'product':'easter-card-2019','source':'Euston'}]
df.product_location[3]=[{'product':'christmas-chocolate-2019','store':'Downtown-A'}, {'product':'easter-chocolate-2018','store':'Euston'},{'product':'easter-chocolate-2017','source':'Euston'}]
我正在尝试从产品名称中正则表达式提取年份(例如2019、2018),并计算每种产品的商店数量,并得出计数最高的年份。
例如,对于第[0]行,我希望输出为2019,因为它的商店数量最多('Downtown-A,Montgomery,Euston')
预期的产量(如果没有一个年份的最高年份则为空白)
[0] '2019'
[1] (blank)
[2] '2019'
[3] (blank)
对数据框中的所有行执行此操作的最佳方法是什么?
答案 0 :(得分:0)
将一行视为列表db
from collections import Counter
db=[{'product':'christmas-socks-2019','store':'Downtown-A,Montgomery'}, {'product':'easter-socks-2018','store':'Euston'},{'product':'easter-socks-2019','source':'Euston'}]
仅从词典列表中提取年份:
products = [int(d['product'].split('-')[-1]) for d in db]
counter = list(Counter(products).items())
根据您的情况
if counter[0][1] == 1:
print('blank')
else:
print(counter[0][0])
要遍历数据框并为每一行实现逻辑,可以尝试以下方法:
for i in range(len(df)):
db = df.loc[i, 'product_location']
products = [int(d['product'].split('-')[-1]) for d in db]
counter = list(Counter(products).items())
if counter[0][1] == 1:
print('blank')
else:
print(counter[0][0])