我有一个包含文字数据的专栏。样品如下所示。
column1
Apple
Mango
Grape
banana
Apple
Mango
Fruit
如果你查看数据,苹果之后是芒果。或者可以说,每当苹果出现下一个芒果时都会发生。可能有多个这样的匹配。怎么能找到它。我知道在nlp中完成的文本相似性发现技术。但是如何处理这种情况。有任何建议请。
答案 0 :(得分:1)
不使用ML:
col = ['Apple', 'Mango', 'Grape', 'banana', 'Apple', 'Mango', 'Fruit']
for wrd in set(col):
indices=[i for i, x in enumerate(col) if x == wrd]
if len(col)-1 in indices:
continue #Last element cannot be followed by anything
elif len(indices) ==1:
continue #Do we want single elements? I suppose not
elif len(set([col[i+1] for i in indices])) ==1:
print(wrd+" is always followed by "+col[indices[0]+1])
> Apple is always followed by Mango