我在这里看过很多例子,但我找不到符合我情景的例子。
我试图接受如下字符串:
string = "Hi my Name is Bill, Bill likes coding, coding is fun"
并且仅返回每个副本的 1 值。
所以输出就像(忽略标点符号):
Bill
coding
如何在Python3
答案 0 :(得分:6)
将字符串拆分为单词。根据要求,有不同的方法可以做到这一点。这是一种方式:
words = re.findall('\w+', string)
计算单词的频率:
word_counts = collections.Counter(words)
获取不止一次出现的所有字词:
result = [word for word in word_counts if word_counts[word] > 1]
答案 1 :(得分:2)
将字符串拆分为所有字词后,可以使用Counter
,然后只打印出现多次的字词count > 1
):
>>> import collections
>>> import re
>>> string = "Hi my Name is Bill, Bill likes coding, coding is fun"
>>> words = re.sub("[^\w]", " ", string).split()
>>> word_counts = collections.Counter(words)
>>> for word, count in word_counts.items():
if count > 1:
print word
输出:
is
Bill
coding
答案 2 :(得分:0)
use https://github.com/Alir3z4/python-stop-words
and then
import collections
from stop_words import get_stop_words
stop_words = get_stop_words('english')
s = "Hi my Name is Bill, Bill likes coding, coding is fun"
words = s.split()
word_map = {}
for word in words:
word = word.strip().replace(',','')
if word not in stop_words:
word_map[word] = word_map.get(word,0)+1
for word,count in word_map.items():
if count>1:
print word
答案 3 :(得分:0)
使用import string
import re
text = "Hi my Name is Bill, Bill likes coding, coding is fun"
regex = re.compile('[%s]' % re.escape(string.punctuation))
out = regex.sub(' ', text)
替换标点符号
Counter
使用from collections import Counter
out = out.split()
counter = Counter(out)
ans = [i[0] for i in counter.items() if i[1] >1]
print(ans)
计算:
<script src='@Url.Content("~/Scripts/jquery-1.8.2.js")' type='text/javascript'></script>
<script src='@Url.Content("~/Scripts/jquery.validate.js")' type='text/javascript'>
</script>
<script src='@Url.Content("~/Scripts/jquery.validate.unobtrusive.js")' type='text/javascript'></script>
答案 4 :(得分:0)
如果我说得对,你想要过滤掉重复的内容吗?如果是这样,你可以这样做。
string = "Hi my Name is Bill, Bill likes coding, coding is fun"
string = string.replace(',' , '')
string = list(set(string.split()))
string = '\n'.join(string)
print(string)
答案 5 :(得分:0)
您可以尝试使用正则表达式找出忽略标点符号的正确单词,试试这个
import re
import collections
sentence="Hi my Name is Bill, Bill likes coding, coding is fun"
wordList = re.sub("[^\w]", " ", sentence).split()
print [item for item, count in collections.Counter(wordList).items() if count > 1]
和收藏品应该找到重复的技巧。
答案 6 :(得分:0)
def result(x): #input should be the string
repeated = []
listed = x.split()
for each in listed:
number = listed.count(each)
if number > 1:
repeated.append(each)
return set(repeated) #there can't be repeated values in a set