在文本中查找类似的模式

时间:2017-08-01 13:16:45

标签: machine-learning nlp

我有一个包含文字数据的专栏。样品如下所示。

                 column1
                  Apple
                  Mango
                  Grape
                  banana
                  Apple
                  Mango
                  Fruit

如果你查看数据,苹果之后是芒果。或者可以说,每当苹果出现下一个芒果时都会发生。可能有多个这样的匹配。怎么能找到它。我知道在nlp中完成的文本相似性发现技术。但是如何处理这种情况。有任何建议请。

1 个答案:

答案 0 :(得分:1)

不使用ML:

col = ['Apple', 'Mango', 'Grape', 'banana', 'Apple', 'Mango', 'Fruit']
for wrd in set(col):
    indices=[i for i, x in enumerate(col) if x == wrd]
    if len(col)-1 in indices:
        continue #Last element cannot be followed by anything
    elif len(indices) ==1:
        continue #Do we want single elements? I suppose not
    elif len(set([col[i+1] for i in indices])) ==1:
        print(wrd+" is always followed by "+col[indices[0]+1])

> Apple is always followed by Mango