Question

hapax是一个只在字符串中出现一次的单词。我的代码有点工作。首先，它得到了第一个hapax，然后，我改变了我输入的字符串，它得到了最后一个，第一个hapax，但不是第二个hapax ...这里是我当前的代码

def hapax(stringz):
    w = ''
    l = stringz.split()
    for x in l:
        w = ''
        l.remove(x)
        for y in l:
            w += y
        if w.find(x) == -1:
            print(x)


hapax('yo i went jogging then yo i went joggin tuesday wednesday')

我得到的只是

then
wednesday

Answer 1

您可以使用Counter类快速完成此操作。

>>> s='yo i went jogging then yo i went joggin tuesday wednesday'
>>> from collections import Counter
>>> Counter(s.split())
Counter({'yo': 2, 'i': 2, 'went': 2, 'joggin': 1, 'then': 1, 'tuesday': 1, 'wednesday': 1, 'jogging': 1})

然后只需遍历返回的字典，查找计数为1

的单词

>>> c=Counter(s.split())
>>> for w in c:
...     if c[w] == 1:
...         print w
... 
joggin
then
tuesday
wednesday
jogging
>>>

你会注意到你实际上在那个字符串中有五个hapax：joggin，然后，周二，周三和慢跑。

您可能需要额外的逻辑来决定是否＆＃34;慢跑＆＃34;和＆＃34;慢跑＆＃34;是不同的词。您还需要确定标点符号是否计数（如果它不应该删除则删除）。这完全取决于问题陈述的优良要求。

关于您的原始代码，我不确定您在此循环中尝试完成的任务：

for y in l:
    w += y

它简单地将所有单词连接成一个没有空格的单个字符串。因此，如果l为['the','cat','sat','on','the','mat']，则w将为'thecatsatonthemat'，这可能会导致您的匹配出现问题。如果原始字符串包含＆＃34;我可能就是说你是对的＆＃34;，单词＆＃34;可能是＆＃34;将连接到＃34;也许＆＃34;并且find会找到它们。

Answer 2

您可以使用collections.Counter的列表理解来做到这一点。另请注意.lower()将所有字词放在小写字母中，以免将Jogging和jogging混为两个不同的字词。

from collections import Counter
my_str = 'yo i went Jogging then yo i went jogging tuesday wednesday'
my_list = Counter(my_str.lower().split())
print([element for element in my_list if my_list[element] == 1])

<强>输出：

['wednesday', 'then', 'tuesday']

此外，如果除了大小写之外还需要删除所有标点符号，则可以在计算带有set(string.punctuation)的单词之前排除标点字符，如下所示：

from collections import Counter
import string

my_str = 'yo! i went Jogging then yo i went jogging tuesday, wednesday.'
removed_punct_str = ''.join(chara for chara in my_str if chara not in set(string.punctuation))
my_list = Counter(removed_punct_str.lower().split())
print([element for element in my_list if my_list[element] == 1])

Answer 3

字符串模块：

使用字符串模块获取标点符号列表并使用我们的常规for循环来替换。演示：

>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>>

更多pythonic： how to replace punctuation in a string python?

<强> ALGO：

通过字符串模块从输入文本中删除标点符号。
转换为小写。
拆分输入文本并更新词典。
从字典中迭代项目并更新hapax字词。

代码：

import string
import collections

def hapax(text):
    # Remove Punctuation from the Input text.
    text = text.translate(string.maketrans("",""), string.punctuation)
    print "Debug 1- After remove Punctuation:", text

    # ignore:- Lower/upper/mix cases
    text = text.lower()
    print "Debug 2- After converted to Lower case:", text

    #- Create Default dictionary. Key is word and value 
    word_count = collections.defaultdict(int)
    print "Debug 3- Collection Default Dictionary:", word_count

    #- Split text and update result dictionary.
    for word in text.split():
        if word:#- Ignore whitespace.
            word_count[word] += 1

    print "Debug 4- Word and its count:", word_count

    #- List which save word which value is 1.
    hapax_words = list()
    for word, value in word_count.items():
        if value==1:
            hapax_words.append(word)

    print "Debug 5- Final Hapax words:", hapax_words


hapax('yo i went jogging then yo i went jogging tuesday wednesday some punctuation ? I and & ')

输出：

$ python 2.py 
Debug 1- After remove Punctuation: yo i went jogging then yo i went jogging tuesday wednesday some punctuation  I and  
Debug 2- After converted to Lower case: yo i went jogging then yo i went jogging tuesday wednesday some punctuation  i and  
Debug 3- Collection Default Dictionary: defaultdict(<type 'int'>, {})
Debug 4- Word and its count: defaultdict(<type 'int'>, {'and': 1, 'then': 1, 'yo': 2, 'i': 3, 'tuesday': 1, 'punctuation': 1, 'some': 1, 'wednesday': 1, 'jogging': 2, 'went': 2})
Debug 5- Final Hapax words: ['and', 'then', 'tuesday', 'punctuation', 'some', 'wednesday']

Answer 4

Python 3.X代码：

import string

def edit_word(new_str):
    """Remove punctuation"""
    new_str = new_str.lower()
    st_table = new_str.maketrans(string.punctuation, '-'*32)
    new_str = new_str.translate(st_table)
    return new_str.replace('-', '')

st = "String to check for hapax!, try with any string"
w_dict = {}
for w in st.split():
    ew = edit_word(w)
    w_dict[ew] = w_dict.get(ew, 0) + 1

for w, c in w_dict.items():
    if c == 1: print(w)

编写程序以从字符串中打印hapax

4 个答案: