首先,感谢您的帮助,我几天来一直在努力解决此问题。
文件myStopWords.txt:
è
ad
più
a
b
c
17
我的代码:
stopWord = set(open("<...>/myStopwords.txt").read().split("\n"))
oldWords = set(["a","b","ad", "è", "più","17","horse"])
print( oldWords.difference(stopWord) )
结果:
{'horse', 'ad', 'più', 'è'}
为什么不从"ad"
中减去"è"
,"più"
,set
?
结果应为{horse}
。
答案 0 :(得分:0)
谢谢。如先前评论中所建议,这是解决方案:
1)在UTF-8中转换文本文件。
2)
fname = '<...>/myStopwords.txt'
with open(fname, encoding='utf-8') as f:
content = f.readlines()
stopWord = [x.strip() for x in content]
oldWords = set(["a","b","ad", "è", "più","17","horse"])
print( oldWords.difference(stopWord) )