Question

我有以下列表：

export { default as Input } from './Input'

我需要在数据框（df）中进行搜索：

search_list = ['STEEL','IRON','GOLD','SILVER']

并将匹配的行插入新的数据框（newdf），并从列表中添加带有匹配词的新列：

      a    b             
0    123   'Blah Blah Steel'
1    456   'Blah Blah Blah'
2    789   'Blah Blah Gold'

我可以使用以下代码提取匹配的行：

      a    b                   c
0    123   'Blah Blah Steel'   'STEEL'
1    789   'Blah Blah Gold'    'GOLD'

但是我不知道如何将列表中的匹配单词添加到c列中。

我认为匹配必须以某种方式捕获列表中匹配单词的索引，然后使用索引号提取值，但我不知道该怎么做。

任何帮助或指针将不胜感激

谢谢

Answer 1

您可以使用extract并过滤掉nan（即不匹配）的那些内容：

search_list = ['STEEL','IRON','GOLD','SILVER']

df['c'] = df.b.str.extract('({0})'.format('|'.join(search_list)), flags=re.IGNORECASE)
result = df[~pd.isna(df.c)]

print(result)

输出

              a       b      c
123 'Blah  Blah  Steel'  Steel
789 'Blah  Blah   Gold'   Gold

请注意，您必须导入re模块才能使用re.IGNORECASE标志。或者，您可以直接使用2标志的值re.IGNORECASE。

更新

如@ user3483203所述，您可以使用以下方法保存导入：

df['c'] = df.b.str.extract('(?i)({0})'.format('|'.join(search_list)))

Answer 2

您可以使用set.intersection查找出现在b列中的单词：

search_list = set(['STEEL','IRON','GOLD','SILVER'])
df['c'] = df['b'].apply(lambda x: set.intersection(set(x.upper().split(' ')), search_list))

输出：

     a                b        c
0  123  Blah Blah Steel  {STEEL}
1  456   Blah Blah Blah       {}
2  789   Blah Blah Gold   {GOLD}

如果您想摆脱不匹配的行，请使用df[df['c'].astype(bool)]

     a                b        c
0  123  Blah Blah Steel  {STEEL}
2  789   Blah Blah Gold   {GOLD}

Answer 3

您可以使用：

search_list = ['STEEL','IRON','GOLD','SILVER']
pat = r'\b|\b'.join(search_list)
pat2 = r'({})'.format('|'.join(search_list))

df_new= df.loc[df.b.str.contains(pat,case=False,na=False)].reset_index(drop=True)
df_new['new_col']=df_new.b.str.upper().str.extract(pat2)
print(df_new)

     a                  b new_col
0  123  'Blah Blah Steel'   STEEL
1  789   'Blah Blah Gold'    GOLD

Answer 4

一种方法是

def get_word(my_string):
    for word in search_list:
         if word.lower() in my_string.lower():
               return word
    return None

new_df["c"]= new_df["b"].apply(get_word)

您也可以按照

的方式进行操作

new_df["c"]= new_df["b"].apply(lambda my_string: [word for word in search_list if word.lower() in my_string.lower()][0])

对于第一个，您可以选择先将列c添加到df，然后过滤掉None，而第二个将抛出错误，如果{ {1}}不包含任何单词。

您还可以看到以下问题：Get the first item from an iterable that matches a condition

从评分最高的答案中应用方法将得出

Answer 5

使用

setOnMouseClicked(event ->{
    if(event.getButton() == MouseButton.PRIMARY) {              
        square.setFill(Color.BLUE);
    }
    else if(square.getFill().equals(Color.BLUE)) {
            square.setFill(Color.BLACK);
    }

Answer 6

在这里，最终结果与您的显示类似的解决方案：

search_list = ['STEEL','IRON','GOLD','SILVER']

def process(x):
    for s in search_list:
        if s in x['b'].upper(): print("'"+ s +"'");return "'"+ s +"'"
    return ''

df['c']= df.apply(lambda x: process(x),axis=1)
df = df.drop(df[df['c'] == ''].index).reset_index(drop=True)

print(df)

输出：

     a                 b        c
0  123  'Blah Blah Steel  'STEEL'
1  789  'Blah Blah Gold'   'GOLD'

Answer 7

您也可以这样做：

import pandas as pd

search_list = ('STEEL','IRON','GOLD','SILVER')

df = pd.DataFrame({'a':[123,456,789],'b':['blah blah Steel','blah blah blah','blah blah Gold']})

df.assign(c = df['b'].apply(lambda x: [j for j in x.split() if j.upper() in search_list]))

结果：

Python-从列表中搜索数据框内的字符串

7 个答案: