我想遍历我的数据框中的一个列,这样如果单词存在,那么将该单词添加到新列。
import pandas as pd
d = {'title':pd.Series(['123','xyz']),
'question':pd.Series(["Hi i want to buy orange and pear", "How much is the banana?"])
}
df =pd.DataFrame(d)
question title
0 Hi i want to buy orange and pear 123
1 How much is the banana? xyz
#write to column if word exist:
fruit_list=['orange','pear','banana']
for i in fruit_list:
df['fruit']=[i if i in qn for qn in df['question']]
question title fruit
0 Hi i want to buy orange and pear 123 orange
1 Hi i want to buy orange and pear 123 pear
2 How much is the banana? xyz banana
SyntaxError: invalid syntax at the 'for' word.
答案 0 :(得分:2)
我相信你想要的是:
fruit_list=['orange','pear','banana']
df['fruit'] = [[f for f in fruit_list if f in qn] for qn in df['question']]
答案 1 :(得分:2)
这个怎么样?
input = [{"question" : "Hi i want to buy orange and pear", "title" : 123}
, {"question" : "How much is the banana?", "title" : 456}]
list_size = len(input)
output = []
fruit_list=['orange','pear','banana']
for i in range(list_size):
fruits = [f for f in fruit_list if f in input[i].get("question")]
for f in fruits:
if not input[i].get("fruit"):
input[i]['fruit'] = f
else:
i = input[i].copy() # need to append a copy, otherwise it will just add references to the same dictionary over and over again
i['fruit'] = f
input.append(i)
print (input)
如果您不想在修改后创建新对象,那么上面的代码可以工作,但如果可以创建另一个输出对象,那么代码就会变得更简单。
input = [{"question" : "Hi i want to buy orange and pear", "title" : 123}
, {"question" : "How much is the banana?", "title" : 456}]
output = []
fruit_list=['orange','pear','banana']
for i in input:
fruits = [f for f in fruit_list if f in i.get("question")]
for f in fruits:
i['fruit'] = f
output.append(i.copy()) # need to append a copy, otherwise it will just add references to the same dictionary over and over again
print (output)
希望有所帮助
答案 2 :(得分:0)
这个怎么样?对于每一行,它提供匹配单词的列表,然后展开数据框,以便每行只有一个匹配的单词。
fruit_list = ['orange', 'pear', 'banana']
df['word_match'] = df.question.str.findall(
r'[\w]+').apply(set).apply(lambda my_set: list(my_set.intersection(fruit_list)))
>>> df
question title word_match
0 Hi i want to buy orange and pear 123 [orange, pear]
1 How much is the banana? xyz [banana]
rows = []
for _, row in df.iterrows():
[rows.append([row.question, row.title, word]) for word in row.word_match]
>>> pd.DataFrame(rows, columns=df.columns)
question title word_match
0 Hi i want to buy orange and pear 123 orange
1 Hi i want to buy orange and pear 123 pear
2 How much is the banana? xyz banana