Question

我在这里看过很多例子，但我找不到符合我情景的例子。

我试图接受如下字符串：

string = "Hi my Name is Bill, Bill likes coding, coding is fun"

并且仅返回每个副本的 1 值。

所以输出就像（忽略标点符号）：

Bill
coding

如何在Python3

中完成此操作

Answer 1

将字符串拆分为单词。根据要求，有不同的方法可以做到这一点。这是一种方式：

words = re.findall('\w+', string)

计算单词的频率：

word_counts = collections.Counter(words)

获取不止一次出现的所有字词：

result = [word for word in word_counts if word_counts[word] > 1]

Answer 2

将字符串拆分为所有字词后，可以使用Counter，然后只打印出现多次的字词count > 1）：

>>> import collections
>>> import re
>>> string = "Hi my Name is Bill, Bill likes coding, coding is fun"
>>> words = re.sub("[^\w]", " ",  string).split()
>>> word_counts = collections.Counter(words)
>>> for word, count in word_counts.items():
        if count > 1:
            print word

输出：

is
Bill
coding

Answer 3

use https://github.com/Alir3z4/python-stop-words

and then 
import collections
from stop_words import get_stop_words
stop_words = get_stop_words('english')
s = "Hi my Name is Bill, Bill likes coding, coding is fun"
words = s.split()
word_map = {}
for word in words:
    word = word.strip().replace(',','')
    if word not in stop_words:
       word_map[word] = word_map.get(word,0)+1
for word,count in word_map.items():
    if count>1:
       print word

Answer 4

使用import string import re text = "Hi my Name is Bill, Bill likes coding, coding is fun" regex = re.compile('[%s]' % re.escape(string.punctuation)) out = regex.sub(' ', text)替换标点符号

Counter

使用from collections import Counter out = out.split() counter = Counter(out) ans = [i[0] for i in counter.items() if i[1] >1] print(ans)计算：

<script src='@Url.Content("~/Scripts/jquery-1.8.2.js")' type='text/javascript'></script>       
<script src='@Url.Content("~/Scripts/jquery.validate.js")' type='text/javascript'>         
</script>
<script src='@Url.Content("~/Scripts/jquery.validate.unobtrusive.js")' type='text/javascript'></script>

Answer 5

如果我说得对，你想要过滤掉重复的内容吗？如果是这样，你可以这样做。

string = "Hi my Name is Bill, Bill likes coding, coding is fun"
string = string.replace(',' , '')
string = list(set(string.split()))
string = '\n'.join(string)
print(string)

Answer 6

您可以尝试使用正则表达式找出忽略标点符号的正确单词，试试这个

import re
import collections
sentence="Hi my Name is Bill, Bill likes coding, coding is fun"
wordList = re.sub("[^\w]", " ",  sentence).split()
print [item for item, count in collections.Counter(wordList).items() if count > 1]

和收藏品应该找到重复的技巧。

Answer 7

def result(x): #input should be the string
    repeated = []
    listed = x.split()
    for each in listed:
        number = listed.count(each)
        if number > 1:
            repeated.append(each)

    return set(repeated) #there can't be repeated values in a set

在字符串中查找重复项，并仅返回重复的单个结果

7 个答案: