在序列中找到相似之处

时间:2018-02-19 11:23:29

标签: python pandas pandas-groupby

我正在尝试查找具有类似组的客户列表。

数据:

customer    rating  lang
A           R       eng
B           R       rus
C           PG      rus
D           PG      eng
E           V       eng
F           V       rus
G           R       rus
H           PG      eng
I           V       eng
J           PG      eng

如果我将新客户x的值传递给' PG'和lang' rus'它应该返回类似于x的客户。

输入:

customer    rating  lang
x           PG      eng

预期产出:

[D, H, J]

如何实现这个目标?

3 个答案:

答案 0 :(得分:1)

如果我理解正确,您希望传递两个参数ratinglang,并获取DataFrame中包含这些参数的记录。您可以通过以下方式执行此操作(感谢jezrael)。

def similar_customers(rating, lang):
    return df.loc[(df['rating'] == rating) & (df['lang'] == lang), 'customer'].tolist()

将您的示例与rating的' PG'以及lang' eng':

similar_customers('PG', 'eng')

Out[3]: ['D', 'H', 'J']

答案 1 :(得分:1)

您可以使用字典存储所有客户数据,使用密钥:元组(评级,lang)和值:匹配客户列表

from collections import defaultdict
data = [
    ("A",   "R",   "eng"),
    ("B",   "R",   "rus"),
    ("C",   "PG",  "rus"),
    ("D",   "PG",  "eng"),
    ("E",   "V",   "eng"),
    ("F",   "V",   "rus"),
    ("G",   "R",   "rus"),
    ("H",   "PG",  "eng"),
    ("I",   "V",   "eng"),
    ("J",   "PG",  "eng")
]
db = defaultdict(list)
for customer, rating, lang in data:
    db[rating,lang].append(customer)

最后,您可以将匹配的客户列表检查为:

print(db["PG","eng"])

带输出:

['D', 'H', 'J']

答案 2 :(得分:0)

您可以遍历客户比较值。假设您有字典列表(如customers = [{"customer": "A", "rating": "R", "lang": "eng"}]),可能的解决方案是

similar = []
for customer in customers:
    if customer["rating"] == rating and customer["lang"] == lang:
        similar.append(customer["customer"])
print(similar)

ratinglang是参数。