假设我有pandas
Series
,就像这样:
import pandas as pd
s = pd.Series(["hello go home bye bye", "you can't always get", "what you waaaaaaant", "apple banana carrot munch 123"])
我想创建一个字典,其中单个字符作为键,其频率作为值。在collections.Counter
:
from collections import Counter
c = Counter(word for row in s for word in row.lower().split())
但是,我现在尝试存储单个字符,并且遇到了三嵌套字典理解的问题。这就是我所拥有的:
c = Counter((letter for letter in word) for word for row in s for word in row.lower().split())
这给了我一个语法错误。如何在一行中等效以下for
循环?
d = {}
for row in s:
for word in row.lower().split():
for letter in word:
d[letter] += 1
答案 0 :(得分:2)
我认为你可以使用
Counter([j for i in s for j in i])
Counter({'a': 16, ' ': 13, 'e': 6, 'o': 6, 'n': 5, 't': 5, 'y': 5, 'h': 4, 'l': 4, 'c': 3, 'b': 3, 'u': 3, 'w': 3, 'g': 2, 'm': 2, 'p': 2, 'r': 2, "'": 1, '1': 1, '3': 1, '2': 1, 's': 1})
获取个人字符数。
答案 1 :(得分:2)
只需传递每个字,调用 .lower()展平列表列表:
import pandas as pd
s = pd.Series(["hello go home bye bye", "you can't always get", "what you waaaaaaant", "apple banana carrot munch 123"])
from collections import Counter
print(Counter(word.lower() for row in s for word in row))
或带地图的链:
from collections import Counter
from itertools import chain
print(Counter(chain.from_iterable(map(str.lower, s))))
两者都会给你:
Counter({'a': 16, ' ': 13, 'e': 6, 'o': 6, 'n': 5, 't': 5, 'y': 5, 'h': 4, 'l': 4, 'c': 3, 'b': 3, 'u': 3, 'w': 3, 'g': 2, 'm': 2, 'p': 2, 'r': 2, "'": 1, '1': 1, '3': 1, '2': 1, 's': 1})
您还可以使用 apply 或 s.str.lower()
print(Counter(chain.from_iterable(s.apply(str.lower))))
print(Counter(chain.from_iterable(s.str.lower())))
答案 2 :(得分:2)
使用pandas:
n [6]: pd.Series(list(''.join(s))).value_counts()
Out[6]:
a 16
13
e 6
o 6
n 5
t 5
y 5
h 4
l 4
u 3
b 3
c 3
w 3
p 2
m 2
r 2
g 2
1 1
s 1
' 1
2 1
3 1
dtype: int64
In [7]: dict(pd.Series(list(''.join(s))).value_counts())
Out[7]:
{' ': 13,
"'": 1,
'1': 1,
'2': 1,
'3': 1,
'a': 16,
'b': 3,
'c': 3,
'e': 6,
'g': 2,
'h': 4,
'l': 4,
'm': 2,
'n': 5,
'o': 6,
'p': 2,
'r': 2,
's': 1,
't': 5,
'u': 3,
'w': 3,
'y': 5}
答案 3 :(得分:1)
你想要这个:
dict(zip([letter for row in s for word in row.lower().split() for letter in word], range(len([letter for row in s for word in row.lower().split() for letter in word]))))