在字符串中查找重复项,并仅返回重复的单个结果

时间:2016-02-12 13:04:59

标签: python string

我在这里看过很多例子,但我找不到符合我情景的例子。

我试图接受如下字符串:

string = "Hi my Name is Bill, Bill likes coding, coding is fun"

并且仅返回每个副本的 1 值。

所以输出就像(忽略标点符号):

Bill
coding

如何在Python3

中完成此操作

7 个答案:

答案 0 :(得分:6)

将字符串拆分为单词。根据要求,有不同的方法可以做到这一点。这是一种方式:

words = re.findall('\w+', string)

计算单词的频率:

word_counts = collections.Counter(words)

获取不止一次出现的所有字词:

result = [word for word in word_counts if word_counts[word] > 1]

答案 1 :(得分:2)

将字符串拆分为所有字词后,可以使用Counter,然后只打印出现多次的字词count > 1):

>>> import collections
>>> import re
>>> string = "Hi my Name is Bill, Bill likes coding, coding is fun"
>>> words = re.sub("[^\w]", " ",  string).split()
>>> word_counts = collections.Counter(words)
>>> for word, count in word_counts.items():
        if count > 1:
            print word

输出:

is
Bill
coding

答案 2 :(得分:0)

use https://github.com/Alir3z4/python-stop-words

and then 
import collections
from stop_words import get_stop_words
stop_words = get_stop_words('english')
s = "Hi my Name is Bill, Bill likes coding, coding is fun"
words = s.split()
word_map = {}
for word in words:
    word = word.strip().replace(',','')
    if word not in stop_words:
       word_map[word] = word_map.get(word,0)+1
for word,count in word_map.items():
    if count>1:
       print word

答案 3 :(得分:0)

使用import string import re text = "Hi my Name is Bill, Bill likes coding, coding is fun" regex = re.compile('[%s]' % re.escape(string.punctuation)) out = regex.sub(' ', text) 替换标点符号

Counter

使用from collections import Counter out = out.split() counter = Counter(out) ans = [i[0] for i in counter.items() if i[1] >1] print(ans) 计算:

<script src='@Url.Content("~/Scripts/jquery-1.8.2.js")' type='text/javascript'></script>       
<script src='@Url.Content("~/Scripts/jquery.validate.js")' type='text/javascript'>         
</script>
<script src='@Url.Content("~/Scripts/jquery.validate.unobtrusive.js")' type='text/javascript'></script>

答案 4 :(得分:0)

如果我说得对,你想要过滤掉重复的内容吗?如果是这样,你可以这样做。

string = "Hi my Name is Bill, Bill likes coding, coding is fun"
string = string.replace(',' , '')
string = list(set(string.split()))
string = '\n'.join(string)
print(string)

答案 5 :(得分:0)

您可以尝试使用正则表达式找出忽略标点符号的正确单词,试试这个

import re
import collections
sentence="Hi my Name is Bill, Bill likes coding, coding is fun"
wordList = re.sub("[^\w]", " ",  sentence).split()
print [item for item, count in collections.Counter(wordList).items() if count > 1]

和收藏品应该找到重复的技巧。

答案 6 :(得分:0)

def result(x): #input should be the string
    repeated = []
    listed = x.split()
    for each in listed:
        number = listed.count(each)
        if number > 1:
            repeated.append(each)

    return set(repeated) #there can't be repeated values in a set