嵌套抢夺关键字/后继单词/前单词功能

时间:2019-06-30 16:31:02

标签: python-3.x pandas function loops nlp

背景

我具有以下代码来创建.tab-pane .tab-header-area .control-buttons-tab .tab-button-down { -fx-background-color: orange; }

df

import pandas as pd word_list = ['crayons', 'cars', 'camels'] l = ['there are many different crayons in the bright blue box and crayons of all different colors', 'i like a lot of sports cars because they go really fast' 'the middle east has many camels to ride and have fun', 'all camels are fun'] df = pd.DataFrame(l, columns=['Text']) 看起来像这样

df

以下代码工作并创建一个函数,该函数可捕获 Text 0 there are many different crayons in the bright blue box and crayons of all different colors 1 i like a lot of sports cars because they go really fastthe middle east has many camels to ride and have fun 2 all camels are fun 个单词之前(trigger和之后(beforewords)之后的单词nextwords

trigger

输出

def find_words(row, word_list):

    sentence = row[0]

    #make empty lists
    trigger = []
    next_words = []
    before_words = []

    for keyword in word_list:
        #split words
        words = str(sentence).split()

        for index in range(0, len(words) - 1):

            # get keyword we want
            if words[index] == keyword:

                # get words after keyword and add to empty list
                next_words.append(words[index + 1:index + 3])

                # get words before keyword and add to empty list
                before_words.append(words[max(index - 3, 0):max(index - 1, 0)])

                # append
                trigger.append(keyword)

    return pd.Series([trigger,  before_words, next_words], index = ['Trigger', 'BeforeWords','NextWords'])

# glue together
df= df.join(df.apply(lambda x: find_words(x, word_list), axis=1))

问题

但是,我想任一 1)取消堆栈2)取消列出 OR 使用另一种/更好的方法来获取以下内容

所需的输出

    Text         Trigger                  BeforeWords             NextWords
0   there ...    [crayons, crayons] [[are, many],[blue, box]] [[in, the],[of, all]]
1   i like ...   [cars, camels]     [[lot, of], [east, has]] [[because, they], [to, ride]]
2   all camels... [camels]             [[]]                  [[are, fun]]

问题

如何调整Text Trigger BeforeWords NextWords 0 there ... crayons are many in the 1 there ... crayons blue box of all 2 i like ... cars lot of because they 3 i like ... camels east has to ride 4 all camels...camels are fun 函数以实现所需的输出?

1 个答案:

答案 0 :(得分:1)

看起来像是嵌套,所以我们可以使用

s=df.set_index(['Text']).stack()
s=pd.DataFrame(s.tolist(),index=s.index).stack()
s.apply(lambda x : ' '.join(x) if type(x)==list else x).unstack(1).reset_index(level=0)
                                                Text      ...          NextWords
0  there are many different crayons in the bright...      ...             in the
1  there are many different crayons in the bright...      ...             of all
0  i like a lot of sports cars because they go re...      ...       because they
1  i like a lot of sports cars because they go re...      ...            to ride
0                                 all camels are fun      ...            are fun
[5 rows x 4 columns]