python列中的模式匹配

时间:2018-07-25 02:16:34

标签: python regex string dataframe search

我有两个数据帧df和df1。我想根据df1中提供的值在df中搜索模式。数据帧如下:

    import pandas as pd
    data={"id":["I983","I873","I526","I721","I536","I327","I626","I213","I625","I524"],
"coltext":[ "I could take my comment back, I would do so in a second. I have addressed my teammates and coaches and while many understand my actions were totall",                                                                                                "We’re just trying to see if he can get on the field as a football player, and then we’ll make decision",
                                                                                                 "TextNow offers low-cost, international calling to over 230 countries. Stay connected longer with rates starting at less than",
                                                                                                 "Wi-Fi can provide you with added coverage in places where cell networks don't always work - like basements and apartments. No roaming fees for Wi-Fi connection",
                                                                                                 "Send messages and make calls on your compute",
                                                                                                 "even have a free, Wi-Fi only version of TextNow, available for download on you",
                                                                                                 "the rest of the players accepted apologies this spring and are welcoming him back",
                                                                                                 "was really looking at him and watching how much this really means to him and how much he really missed us",
                                                                                                 "I’ll deal with the problem and I’ll remedy the problem",
                                                                                                 "The first step was for him to be able to complete what we call our bottom line program which has been completed"]}
df=pd.DataFrame(data=data)
data1={"col1":["addressed teammates coaches","football player decision","watching really missed", "bottom line program","meassges make calls"],
     "col2":["international calling over","download on you","rest players accepted","deal problem remedy","understand actions totall"],
     "col3":["first step him","Wi-Fi only version","cell network works","accepted apologies","stay connected longer"]}
df1=pd.DataFrame(data=data1)

例如,来自df1 ['col1']的第一个元素“向队友教练致辞”位于df ['coltext']的第一个元素中,同样,我想从df ['coltext']的df1的每一列中搜索每个元素。如果找到了pattern,则在df中创建第三个col。

所需的输出:

id  coltext                                 patternMatch
I983  I could take my comment back,               col1, col2
I873  We’re just trying to see if he can              col1
I526  TextNow offers low-cost,                    col3, col2
I721  Wi-Fi can provide you with                      col3
I536  Send messages and make calls                    col1

1 个答案:

答案 0 :(得分:1)

可能还有其他有效的方法,一种方法可能如下:

# create dictionary of data1 such that values and keys are reversed
my_dict = {item:k for k, v in data1.items() for item in v}
# for column in df check if all words are in 'coltext' for each key in dictionary
df['patternMatch'] = df['coltext'].apply(lambda row: 
                                         {v for k, v in my_dict.items() 
                                                if all(word in row for word in k.split())})