我想计算每一行的列数据框中字符串列表中现有单词的数量。
代码:
list_1 = ['Apple', 'Mango' ,'Orange', 'pr[éeêè]t[s]?' ]
list_2 = ['weather', 'r[ea]d' ,'p[wr]iority', 'pr[éeêè]t[s]?' ]
list_3 = ['n[eéè]d','snow[s]?', 'pr[éeêè]t[s]?' ]
dict = {"s1":['Column_1', list_1],
"s2": ['Column_1', list_3],
"s3": ['Column_2', list_2],
"s4": ['Column_3', list_3],
"s5": ['Column_2','Column_3',list_1],}
for elt in list(dict.keys()):
if len(dict[elt])<=2:
d = Counter(re.findall(r'|'.join(dict[elt][1]).lower(), df[dict[elt][0]].str.lower()))
df[elt] = sum(d.values())
elif len(dict[elt])>2:
aa = Counter(re.findall(r'|'.join(dict[elt][2]).lower(), df[dict[elt][0]].str.lower()))
bb = Counter(re.findall(r'|'.join(dict[elt][2]).lower(), df[dict[elt][1]].str.lower()))
b = sum(bb.values())
a = sum(aa.values())
d = a +b
df[elt] = d
数据示例:
d = {'Column_1': ['mango pret Orange No manner', ' préts No scan'], 'Column_2': ['read priority No', 'This is a priority'],'Column_3': ['No add', 'yep']}
df = pd.DataFrame(data=d)
d2 = {'s1': [3, 1], 's3':[2,1]}
df2 = pd.DataFrame(data=d2)
但是我遇到了这个错误... TypeError:预期的字符串或类似字节的对象
答案 0 :(得分:0)
这对我有用(python 3.6.8版):
d = {'Column_1': ['mango pret Orange No manner', ' préts No scan'], 'Column_2': ['read priority No', 'This is a priority'],'Column_3': ['No add', 'yep']}
df = pd.DataFrame(data=d)
d2 = {'s1': [3, 1], 's3':[2,1]}
df2 = pd.DataFrame(data=d2)
list_1 = ['Apple', 'Mango' ,'Orange', 'pr[éeêè]t[s]?' ]
list_2 = ['weather', 'r[ea]d' ,'p[wr]iority', 'pr[éeêè]t[s]?' ]
list_3 = ['n[eéè]d','snow[s]?', 'pr[éeêè]t[s]?' ]
dic = {"s1":['Column_1', list_1],
"s2": ['Column_1', list_3],
"s3": ['Column_2', list_2],
"s4": ['Column_3', list_3],
"s5": ['Column_2','Column_3',list_1],}
for elt in list(dic.keys()):
if len(dic[elt])<=2:
d = Counter(re.findall(r'|'.join(dic[elt][1]).lower(), str(df[dic[elt][0]].str.lower())))
df[elt] = sum(d.values())
elif len(dic[elt])>2:
aa = Counter(re.findall(r'|'.join(dic[elt][2]).lower(), str(df[dic[elt][0]].str.lower())))
bb = Counter(re.findall(r'|'.join(dic[elt][2]).lower(), str(df[dic[elt][1]].str.lower())))
b = sum(bb.values())
a = sum(aa.values())
d = a +b
df[elt] = d