我正在尝试更改大型数据框中的元素
e.g
Keyword |cat1|cat2|cat3|
--------------------------------|----|----|----|
beach holiday | | | |
package beach holiday | | | |
inclusive package beach holiday | | | |
我运行一个方法find_keywords(Keyword),该方法传入关键字,例如“包容性海滩假期”,与文本类别列表进行比较,并返回前三个相关类别。
'''
Input a Keyword, breaks it down and finds which category it matches
'''
def find_keywords(keywords):
words = keywords.split()
wordlist = []
for word in words:
if word in categories:
wordlist.append(word)
wordlist = wordlist [:3]
return wordlist
在这种情况下:
['inclusive','package','beach']
这一切都很好,当我在数据上运行我的主方法时
if __name__ == '__main__':
df = get_csv(csv)
for index, row in df.iterrows():
row['Keyword'].lower()
print(row['Keyword'])
tokens = find_keywords(row['Keyword'])
print(tokens)
它返回:
beach holiday
['beach','holiday']
package beach holiday
['package','beach','holiday']
inclusive package beach holiday
['inclusive','package','beach']
我如何获取每个列表并将其添加到cat1 / cat2 / cat3列
生成数据框:
Keyword |cat1 |cat2 |cat3 |
--------------------------------|---- |---- |---- |
beach holiday |beach |holiday | |
package beach holiday |package|beach |holiday|
inclusive package beach holiday |inclusive|package|beach |
使用@DaFanat的解决方案我能够得到我所要求的但是我对此有轻微的排列,是否可以检查字典而不是列表?
e.g
{'beach': ['beach', 'sand', 'coast'],
'hotel': ['hotel', 'resort']}
然后将头部术语应用于该类别,例如,如果它找到沙子就会将其标记为海滩。
我的尝试: 如果名称 =='主要':
df = get_csv(csv)
h = open('head_categories.txt','r')
mydict = h.read()
mydict = ast.literal_eval(mydict)
for key in mydict.keys():
item = key
if item in mydict[key]:
target_cats = item
find_keywords = lambda kw: [s for s in kw.split() if s in target_cats]
df.loc[:, 'cat_list'] = df['Keyword'].apply(lambda x: find_keywords(x))
for i in range(1, 4):
df.loc[:, 'cat{0}'.format(i)] = df['cat_list'].apply(lambda x: x[i-1] if len(x) >= i else '')
print(df)
df.to_csv('kuoniTesting.csv')
答案 0 :(得分:0)
我认为这可以胜任:
target_cats = ['cat', 'dog', 'cow']
df = pd.DataFrame({'Keyword': ['cat dog cow', 'cat dog', 'dog sheep']})
find_keywords = lambda kw: [s for s in kw.split() if s in target_cats]
df.loc[:, 'cat_list'] = df['Keyword'].apply(lambda x: find_keywords(x))
for i in range(1, 4):
df.loc[:, 'cat{0}'.format(i)] = df['cat_list'].apply(lambda x: x[i-1] if len(x) >= i else '')
Keyword cat_list cat1 cat2 cat3
0 cat dog cow [cat, dog, cow] cat dog cow
1 cat dog [cat, dog] cat dog
2 dog sheep [dog] dog