hapax是一个只在字符串中出现一次的单词。我的代码有点工作。首先,它得到了第一个hapax,然后,我改变了我输入的字符串,它得到了最后一个,第一个hapax,但不是第二个hapax ...这里是我当前的代码
def hapax(stringz):
w = ''
l = stringz.split()
for x in l:
w = ''
l.remove(x)
for y in l:
w += y
if w.find(x) == -1:
print(x)
hapax('yo i went jogging then yo i went joggin tuesday wednesday')
我得到的只是
then
wednesday
答案 0 :(得分:1)
您可以使用Counter类快速完成此操作。
>>> s='yo i went jogging then yo i went joggin tuesday wednesday'
>>> from collections import Counter
>>> Counter(s.split())
Counter({'yo': 2, 'i': 2, 'went': 2, 'joggin': 1, 'then': 1, 'tuesday': 1, 'wednesday': 1, 'jogging': 1})
然后只需遍历返回的字典,查找计数为1
>>> c=Counter(s.split())
>>> for w in c:
... if c[w] == 1:
... print w
...
joggin
then
tuesday
wednesday
jogging
>>>
你会注意到你实际上在那个字符串中有五个hapax:joggin,然后,周二,周三和慢跑。
您可能需要额外的逻辑来决定是否"慢跑"和"慢跑"是不同的词。您还需要确定标点符号是否计数(如果它不应该删除则删除)。这完全取决于问题陈述的优良要求。
关于您的原始代码,我不确定您在此循环中尝试完成的任务:
for y in l:
w += y
它简单地将所有单词连接成一个没有空格的单个字符串。因此,如果l为['the','cat','sat','on','the','mat']
,则w
将为'thecatsatonthemat'
,这可能会导致您的匹配出现问题。如果原始字符串包含"我可能就是说你是对的",单词"可能是"将连接到#34;也许"并且find
会找到它们。
答案 1 :(得分:1)
您可以使用collections.Counter
的列表理解来做到这一点。另请注意.lower()
将所有字词放在小写字母中,以免将Jogging
和jogging
混为两个不同的字词。
from collections import Counter
my_str = 'yo i went Jogging then yo i went jogging tuesday wednesday'
my_list = Counter(my_str.lower().split())
print([element for element in my_list if my_list[element] == 1])
<强>输出:强>
['wednesday', 'then', 'tuesday']
此外,如果除了大小写之外还需要删除所有标点符号,则可以在计算带有set(string.punctuation)
的单词之前排除标点字符,如下所示:
from collections import Counter
import string
my_str = 'yo! i went Jogging then yo i went jogging tuesday, wednesday.'
removed_punct_str = ''.join(chara for chara in my_str if chara not in set(string.punctuation))
my_list = Counter(removed_punct_str.lower().split())
print([element for element in my_list if my_list[element] == 1])
答案 2 :(得分:0)
字符串模块:
使用字符串模块获取标点符号列表并使用我们的常规for循环来替换。演示:
>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>>
更多pythonic:
how to replace punctuation in a string python?
<强> ALGO:强>
代码:
import string
import collections
def hapax(text):
# Remove Punctuation from the Input text.
text = text.translate(string.maketrans("",""), string.punctuation)
print "Debug 1- After remove Punctuation:", text
# ignore:- Lower/upper/mix cases
text = text.lower()
print "Debug 2- After converted to Lower case:", text
#- Create Default dictionary. Key is word and value
word_count = collections.defaultdict(int)
print "Debug 3- Collection Default Dictionary:", word_count
#- Split text and update result dictionary.
for word in text.split():
if word:#- Ignore whitespace.
word_count[word] += 1
print "Debug 4- Word and its count:", word_count
#- List which save word which value is 1.
hapax_words = list()
for word, value in word_count.items():
if value==1:
hapax_words.append(word)
print "Debug 5- Final Hapax words:", hapax_words
hapax('yo i went jogging then yo i went jogging tuesday wednesday some punctuation ? I and & ')
输出:
$ python 2.py
Debug 1- After remove Punctuation: yo i went jogging then yo i went jogging tuesday wednesday some punctuation I and
Debug 2- After converted to Lower case: yo i went jogging then yo i went jogging tuesday wednesday some punctuation i and
Debug 3- Collection Default Dictionary: defaultdict(<type 'int'>, {})
Debug 4- Word and its count: defaultdict(<type 'int'>, {'and': 1, 'then': 1, 'yo': 2, 'i': 3, 'tuesday': 1, 'punctuation': 1, 'some': 1, 'wednesday': 1, 'jogging': 2, 'went': 2})
Debug 5- Final Hapax words: ['and', 'then', 'tuesday', 'punctuation', 'some', 'wednesday']
答案 3 :(得分:0)
Python 3.X代码:
import string
def edit_word(new_str):
"""Remove punctuation"""
new_str = new_str.lower()
st_table = new_str.maketrans(string.punctuation, '-'*32)
new_str = new_str.translate(st_table)
return new_str.replace('-', '')
st = "String to check for hapax!, try with any string"
w_dict = {}
for w in st.split():
ew = edit_word(w)
w_dict[ew] = w_dict.get(ew, 0) + 1
for w, c in w_dict.items():
if c == 1: print(w)