pandas list comprehension if statement

时间:2016-04-20 23:48:44

标签: python pandas list-comprehension

我想遍历我的数据框中的一个列,这样如果单词存在,那么将该单词添加到新列。

这是我的数据:

import pandas as pd

d = {'title':pd.Series(['123','xyz']),
'question':pd.Series(["Hi i want to buy orange and pear", "How much is the banana?"])
 }
df =pd.DataFrame(d)

DF

                         question     title
0  Hi i want to buy orange and pear   123
1           How much is the banana?   xyz

的代码:

#write to column if word exist:

fruit_list=['orange','pear','banana']
for i in fruit_list:
    df['fruit']=[i if i in qn for qn in df['question']]

期望的输出:

                         question     title   fruit
0  Hi i want to buy orange and pear   123     orange
1  Hi i want to buy orange and pear   123     pear
2  How much is the banana?            xyz     banana

错误

SyntaxError: invalid syntax at the 'for' word. 

3 个答案:

答案 0 :(得分:2)

我相信你想要的是:

fruit_list=['orange','pear','banana']

df['fruit'] = [[f for f in fruit_list if f in qn] for qn in df['question']]

答案 1 :(得分:2)

这个怎么样?

input = [{"question" : "Hi i want to buy orange and pear", "title" : 123}
        , {"question" : "How much is the banana?", "title" : 456}]
list_size = len(input)

output = []

fruit_list=['orange','pear','banana']

for i in range(list_size):
    fruits = [f for f in fruit_list if f in input[i].get("question")]
    for f in fruits:
        if not input[i].get("fruit"):
            input[i]['fruit'] = f
        else:
            i = input[i].copy() # need to append a copy, otherwise it will just add references to the same dictionary over and over again
            i['fruit'] = f
            input.append(i) 
print (input)

如果您不想在修改后创建新对象,那么上面的代码可以工作,但如果可以创建另一个输出对象,那么代码就会变得更简单。

input = [{"question" : "Hi i want to buy orange and pear", "title" : 123}
                     , {"question" : "How much is the banana?", "title" : 456}]
output = []
fruit_list=['orange','pear','banana']

for i in input:
    fruits = [f for f in fruit_list if f in i.get("question")]
    for f in fruits:
        i['fruit'] = f
        output.append(i.copy()) # need to append a copy, otherwise it will just add references to the same dictionary over and over again
print (output)

希望有所帮助

答案 2 :(得分:0)

这个怎么样?对于每一行,它提供匹配单词的列表,然后展开数据框,以便每行只有一个匹配的单词。

fruit_list = ['orange', 'pear', 'banana']
df['word_match'] = df.question.str.findall(
    r'[\w]+').apply(set).apply(lambda my_set: list(my_set.intersection(fruit_list)))
>>> df
                           question title      word_match
0  Hi i want to buy orange and pear   123  [orange, pear]
1           How much is the banana?   xyz        [banana]

rows = []
for _, row in df.iterrows():
    [rows.append([row.question, row.title, word]) for word in row.word_match]
>>> pd.DataFrame(rows, columns=df.columns)
                           question title word_match
0  Hi i want to buy orange and pear   123     orange
1  Hi i want to buy orange and pear   123       pear
2           How much is the banana?   xyz     banana