如何收集列表python中定义的项目

时间:2015-01-14 08:40:38

标签: python list unicode collect

我必须找到“a ..,z”,“A,..,Z”,“space”,“。”的标志。和某些数据中的“,”。

我试过了代码:

fh = codecs.open("mydata.txt", encoding = "utf-8")
text = fh.read()
fh1 = unicode(text)
dic_freq_signs = dict(Counter(fh1.split()))
All_freq_signs = dic_freq_signs.items()
List_signs = dic_freq_signs.keys()
List_freq_signs = dic_freq_signs.values()

但它给我的所有迹象都不是我要找的那些? 有人可以帮忙吗?

(它必须是unicode)

2 个答案:

答案 0 :(得分:0)

检查字典迭代..

All_freq_signs = [ item for item in dic_freq_signs.items() if item.something == "somevalue"]
def criteria(value):
    return value%2 == 0
All_freq_signs = [ item for item in dic_freq_signs.items() if criteria(item)]

答案 1 :(得分:0)

确保导入字符串模块,使用它可以轻松获得字符范围a to zA to Z

import string

Counter(any_string)给出字符串中每个字符的计数。通过使用split(),计数器将返回字符串中每个单词的计数,与您的要求相矛盾。所以我假设你需要字符数。

dic_all_chars = dict(Counter(fh1))    # this gives counts of all characters in the string
signs = string.lowercase + string.uppercase + ' .,'    # these are the characters you want to check

# using dict comprehension and checking if the key is in the characters you want
dic_freq_signs = {key: value for key, value in dic_all_chars.items() 
                             if key in signs}

dic_freq_signs只会出现您想要计算为关键字及其作为值的计数的迹象。