好的,所以我有两个列表,一个单词,如下:
["happy", "sad", "angry", "jumpy"]
等
然后是一个短语列表,如下:
["I'm so happy with myself lately!", "Johnny, im so sad, so very sad, call me", "i feel like crap. SO ANGRY!!!!"]
我想使用第一个单词列表来查找短语列表中单词的发生率。我不在乎我是否拉出实际的单词,用空格或它们出现的次数分开。
从我的研究看来,似乎re模块和过滤器是可行的方法吗?
另外,如果我对我所需要的解释不清楚,请告诉我。
答案 0 :(得分:4)
>>> phrases = ["I'm so happy with myself lately!", "Johnny, im so sad, so very sad, call me", "i feel like crap. SO ANGRY!!!!"]
>>> words = ["happy", "sad", "angry", "jumpy"]
>>>
>>> for phrase in phrases:
... print phrase
... print {word: phrase.count(word) for word in words}
...
I'm so happy with myself lately!
{'jumpy': 0, 'angry': 0, 'sad': 0, 'happy': 1}
Johnny, im so sad, so very sad, call me
{'jumpy': 0, 'angry': 0, 'sad': 2, 'happy': 0}
i feel like crap. SO ANGRY!!!!
{'jumpy': 0, 'angry': 0, 'sad': 0, 'happy': 0}
答案 1 :(得分:2)
非常简单,直接的解决方案:
>>> phrases = ["I'm so happy with myself lately!", "Johnny, im so sad, so very sad, call me", "i feel like crap. SO ANGRY!!!!"]
>>> words = ["happy", "sad", "angry", "jumpy"]
>>> for phrase in phrases:
for word in words:
if word in phrase:
print('"{0}" is in the phrase "{1}".'.format(word, phrase))
"happy" is in the phrase "I'm so happy with myself lately!".
"sad" is in the phrase "Johnny, im so sad, so very sad, call me".
答案 2 :(得分:1)
>>> phrases = ["I'm so happy with myself lately!", "Johnny, im so sad, so very sad, call me", "i feel like crap. SO ANGRY!!!!"]
>>> words = ["happy", "sad", "angry", "jumpy"]
>>> words_in_phrases = [re.findall(r"\b[\w']+\b", phrase.lower()) for phrase in phrases]
>>> words_in_phrases
[["i'm", 'so', 'happy', 'with', 'myself', 'lately'], ['johnny', 'im', 'so', 'sad', 'so', 'very', 'sad', 'call', 'me'], ['i', 'feel', 'like', 'crap', 'so', 'angry']]
>>> word_counts = [{word: phrase.count(word) for word in words} for phrase in words_in_phrases]
>>> word_counts
[{'jumpy': 0, 'angry': 0, 'sad': 0, 'happy': 1}, {'jumpy': 0, 'angry': 0, 'sad': 2, 'happy': 0}, {'jumpy': 0, 'angry': 1, 'sad': 0, 'happy': 0}]
>>>
对于行word_counts = [{word: phrase.count(word) for word in words} for...
,您需要Python 2.7+。如果出于某种原因,您正在使用< Python 2.7,用以下内容替换该行:
>>> word_counts = [dict((word, phrase.count(word)) for word in words) for phrase in words_in_phrases]