Question

我有一个包含文字数据的专栏。样品如下所示。

                 column1
                  Apple
                  Mango
                  Grape
                  banana
                  Apple
                  Mango
                  Fruit

如果你查看数据，苹果之后是芒果。或者可以说，每当苹果出现下一个芒果时都会发生。可能有多个这样的匹配。怎么能找到它。我知道在nlp中完成的文本相似性发现技术。但是如何处理这种情况。有任何建议请。

Answer 1

不使用ML：

col = ['Apple', 'Mango', 'Grape', 'banana', 'Apple', 'Mango', 'Fruit']
for wrd in set(col):
    indices=[i for i, x in enumerate(col) if x == wrd]
    if len(col)-1 in indices:
        continue #Last element cannot be followed by anything
    elif len(indices) ==1:
        continue #Do we want single elements? I suppose not
    elif len(set([col[i+1] for i in indices])) ==1:
        print(wrd+" is always followed by "+col[indices[0]+1])

> Apple is always followed by Mango

在文本中查找类似的模式

1 个答案: