计算数据框python行中的单词

时间:2019-05-30 22:24:27

标签: python regex pandas dataframe join

我想计算给定数据框列中每一行的列表中每个关键字的存在总数(请更正我的代码)。

什么原因导致的问题是像这样的Emmangodb字符串,它包含芒果,但我不想吃。

 d = {
'Column_1': ['mango pret Orange No manner Emmangodb snow', ' préts No  scan eblanc'], 
'Column_2': ['red priority No Apple juice', 'This is a priority monnoir '],
'Column_3': ['No add', 'orange']
}

df = pd.DataFrame(data=d)

list_1 = ['Apple juice', 'Mango' ,'Orange', 'pr[éeêè]t[s]?']
list_2 = ['weather', 'r[ea]d' ,'p[wr]iority', 'noir?']
list_3 = ['n[eéè]d','snow[s]?', 'blanc?']
dict = {
"s1": ['Column_1', list_1],
"s2": ['Column_1', list_3],
"s3": ['Column_2', list_2],
"s4": ['Column_2','Column_3',list_1]
}

我所做的是:

 d2 = {}
 for key, lst in dict.items():
    col_names = [element for element in lst if isinstance(element, str)]
    regex_lists = [element for element in lst if isinstance(element, list)]
    regex_list = functools.reduce(lambda x, y: x+y, regex_lists)
    map_function = lambda s:len(re.findall(r'|'.join(regex_list).lower(), str(s).lower()))
    df_regex_count = df[col_names].applymap(map_function)
    df[key] = [sum(lst_tmp) for lst_tmp in    df_regex_count.values.tolist()]

预期输出:

 d2 = {'s1': [2, 1], 's2':[1,0],'s2':[2,1],'s4':[1,1]}
 df2 = pd.DataFrame(data=d2)

0 个答案:

没有答案