我试图弄清楚2个容器之间的区别,但容器是一个奇怪的结构,所以我不知道最好的方法来执行它的差异。一个容器类型和结构我不能改变,但我可以改变其他人(变量delims)。
delims = ['on','with','to','and','in','the','from','or']
words = collections.Counter(s.split()).most_common()
# words results in [("the",2), ("a",9), ("diplomacy", 1)]
#I want to perform a 'difference' operation on words to remove all the delims words
descriptive_words = set(words) - set(delims)
# because of the unqiue structure of words(list of tuples) its hard to perform a difference
# on it. What would be the best way to perform a difference? Maybe...
delims = [('on',0),('with',0),('to',0),('and',0),('in',0),('the',0),('from',0),('or',0)]
words = collections.Counter(s.split()).most_common()
descriptive_words = set(words) - set(delims)
# Or maybe
words = collections.Counter(s.split()).most_common()
n_words = []
for w in words:
n_words.append(w[0])
delims = ['on','with','to','and','in','the','from','or']
descriptive_words = set(n_words) - set(delims)
答案 0 :(得分:3)
如何通过删除所有分隔符来修改words
?
words = collections.Counter(s.split())
for delim in delims:
del words[delim]
答案 1 :(得分:1)
这是我怎么做的:
delims = set(['on','with','to','and','in','the','from','or'])
# ...
descriptive_words = filter(lamdba x: x[0] not in delims, words)
使用过滤方法。一个可行的替代方案是:
delims = set(['on','with','to','and','in','the','from','or'])
# ...
decsriptive_words = [ (word, count) for word,count in words if word not in delims ]
确保delims
设置为允许O(1) lookup。
答案 2 :(得分:1)
最简单的答案是:
import collections
s = "the a a a a the a a a a a diplomacy"
delims = {'on','with','to','and','in','the','from','or'}
// For older versions of python without set literals:
// delims = set(['on','with','to','and','in','the','from','or'])
words = collections.Counter(s.split())
not_delims = {key: value for (key, value) in words.items() if key not in delims}
// For older versions of python without dict comprehensions:
// not_delims = dict(((key, value) for (key, value) in words.items() if key not in delims))
这给了我们:
{'a': 9, 'diplomacy': 1}
另一种选择是先发制人:
import collections
s = "the a a a a the a a a a a diplomacy"
delims = {'on','with','to','and','in','the','from','or'}
counted_words = collections.Counter((word for word in s.split() if word not in delims))
在这里,您可以在将单词列表提交给计数器之前对单词列表应用过滤,这会得到相同的结果。
答案 3 :(得分:0)
如果你正在迭代它,为什么还要把它们转换成套呢?
dwords = [delim[0] for delim in delims]
words = [word for word in words if word[0] not in dwords]
答案 4 :(得分:0)
为了提高性能,您可以使用 lambda 函数
filter(lambda word: word[0] not in delim, words)