我之前问了一个问题,我得到了我想要的答案。但是我现在有更多的问题。
我有一个列表如下:
name = ['road', 'roadwork', 'pill', 'pillbox', 'pillow', 'ball',
'football', 'basketball', 'work', 'box', 'foot', 'basket']
下面的代码将复合名词中的单词与基本单词分开:
for candidate in name:
for word in name:
if word != candidate and word in candidate:
break
else:
print candidate
但是我意识到代码限制性太强,因为它也会从列表中删除“枕头”。
是否有可以产生以下结果的代码:
name = ['road', 'pill', 'pillow', 'ball', 'work', 'box', 'foot', 'basket']
答案 0 :(得分:1)
对于你的普通单词,确定它是否是复合词的最简单方法是将其切成两半并查看两半是否为单词。您必须使用不同的斩波点重复测试,因此运行时间与单词的长度成正比。对于除189,000 character long chemical names以外的任何英语单词,它应该相当快。
words = ['road', 'roadwork', 'pill', 'pillbox', 'pillow', 'ball', 'football', 'basketball', 'work', 'box', 'foot', 'basket']
wordSet = set(words)
def isWord(w):
return w in wordSet
def isCompoundWord(word):
for idx in range(1, len(word)):
left = word[:idx]
right = word[idx:]
if isWord(left) and isWord(right):
return True
return False
nonCompoundWords = [word for word in words if not isCompoundWord(word)]
print nonCompoundWords
输出:
['road', 'pill', 'pillow', 'ball', 'work', 'box', 'foot', 'basket']
答案 1 :(得分:0)
你需要找出减去匹配后剩下的单词是否是另一个单词。会有情况,我想象词源不会匹配的地方。我想的是包含另一个单词加上'是'的单词,其中'is'不会被用作其含义,例如。
编辑:例如:
words = ['book','store','bookstore','booking']
li = []
for word in words:
for test in words:
if test in word:
temp = word[len(test):]
if temp in words and word not in li:
li.append(word)
for x in li:
words.remove(x)
print words