Question

我需要检查一些字符串中的单词的数量，然后使用

len(re.split('[А-Яа-яЁё]{5,}', s))

但是它无法正常工作并且将字符串'Москва, Вавилова'设置为字符串它返回

['', ', ', '']

要获得什么我应该更改

['Москва', 'Вавилова']

Answer 1

为什么要量化自己？让Counter()做到这一点：

from collections import Counter
text = "tata, ohhhhh, tata, oh, tata, ohhhh"
c = Counter ( (len(w.strip()) for w in text.split(",") ))

print(c.most_common())

输出：

[(4, 3), (2, 1), (5, 1), (6, 1)] # (word-length, count)

使用defaultdict也会为您提供文字：

d = defaultdict(list)
for w in (w.strip() for w in text.split(",")):
    d[len(w)].append(w)

print(d)

输出：

defaultdict(<type 'list'>, 
            {2: ['oh'], 4: ['tata', 'tata', 'tata'], 5: ['ohhhh'], 6: ['ohhhhh']})

，但是您之后必须获得len()列表。

Answer 2

尝试

re.findall('[А-Яа-яЁё]{5,}', 'Москва,Вавилова')

来自文档。

re.findall

全部退回字符串中作为字符串列表的模式的非重叠匹配。

re.split

通过模式的出现来分割字符串。

正则表达式：如何在一定长度的单词上拆分字符串

2 个答案: