添加单个字符以在Counter中添加键

时间:2017-04-03 06:30:43

标签: python string dictionary counter key-value

如果Counter对象的键的类型是str,即:

我可以这样做:

>>> vocab_counter = Counter("the lazy fox jumps over the brown dog".split())

>>> vocab_counter  = Counter({k+u"\uE000":v for k,v in vocab_counter.items()})
>>> vocab_counter
Counter({'brown\ue000': 1,
         'dog\ue000': 1,
         'fox\ue000': 1,
         'jumps\ue000': 1,
         'lazy\ue000': 1,
         'over\ue000': 1,
         'the\ue000': 2})

将字符添加到所有键的快速和/或pythonic方法是什么?

上述方法是实现最终计数器的唯一方法,并将字符附加到所有键上吗?还有其他方法可以达到同样的目标吗?

4 个答案:

答案 0 :(得分:1)

更好的方法是在创建计数器对象之前添加该字符。您可以使用Counter中的生成器表达式来执行此操作:

In [15]: vocab_counter = Counter(w + u"\uE000" for w in "the lazy fox jumps over the brown dog".split())

In [16]: vocab_counter
Out[16]: Counter({'the\ue000': 2, 'fox\ue000': 1, 'dog\ue000': 1, 'jumps\ue000': 1, 'lazy\ue000': 1, 'over\ue000': 1, 'brown\ue000': 1})

如果在创建计数器之前无法修改单词,则可以覆盖Counter对象以添加特殊字符during setting the values for keys

答案 1 :(得分:1)

我能想到的唯一另一种优化方法是使用Counter的子类,在插入密钥时附加字符:

from collections import Counter


class CustomCounter(Counter):
    def __setitem__(self, key, value):
        if len(key) > 1 and not key.endswith(u"\uE000"):
            key += u"\uE000"
        super(CustomCounter, self).__setitem__(key, self.get(key, 0) + value)

<强>演示:

>>> CustomCounter("the lazy fox jumps over the brown dog".split())
CustomCounter({u'the\ue000': 2, u'fox\ue000': 1, u'brown\ue000': 1, u'jumps\ue000': 1, u'dog\ue000': 1, u'over\ue000': 1, u'lazy\ue000': 1})
# With both args and kwargs 
>>> CustomCounter("the lazy fox jumps over the brown dog".split(), **{'the': 1, 'fox': 3})
CustomCounter({u'fox\ue000': 4, u'the\ue000': 3, u'brown\ue000': 1, u'jumps\ue000': 1, u'dog\ue000': 1, u'over\ue000': 1, u'lazy\ue000': 1})

答案 2 :(得分:1)

我使用的最短路是,

vocab_counter = Counter("the lazy fox jumps over the brown dog".split()) 
for key in vocab_counter.keys():
  vocab_counter[key+u"\uE000"] = vocab_counter.pop(key)

答案 3 :(得分:0)

你可以用字符串操作来做到这一点:

text = 'the lazy fox jumps over the brown dog'
Counter((text + ' ').replace(' ', '_abc ').strip().split())