编写程序以从字符串中打印hapax

时间:2015-03-23 13:28:30

标签: python python-3.x

hapax是一个只在字符串中出现一次的单词。我的代码有点工作。首先,它得到了第一个hapax,然后,我改变了我输入的字符串,它得到了最后一个,第一个hapax,但不是第二个hapax ...这里是我当前的代码

def hapax(stringz):
    w = ''
    l = stringz.split()
    for x in l:
        w = ''
        l.remove(x)
        for y in l:
            w += y
        if w.find(x) == -1:
            print(x)


hapax('yo i went jogging then yo i went joggin tuesday wednesday')

我得到的只是

then
wednesday

4 个答案:

答案 0 :(得分:1)

您可以使用Counter类快速完成此操作。

>>> s='yo i went jogging then yo i went joggin tuesday wednesday'
>>> from collections import Counter
>>> Counter(s.split())
Counter({'yo': 2, 'i': 2, 'went': 2, 'joggin': 1, 'then': 1, 'tuesday': 1, 'wednesday': 1, 'jogging': 1})

然后只需遍历返回的字典,查找计数为1

的单词
>>> c=Counter(s.split())
>>> for w in c:
...     if c[w] == 1:
...         print w
... 
joggin
then
tuesday
wednesday
jogging
>>> 

你会注意到你实际上在那个字符串中有五个hapax:joggin,然后,周二,周三和慢跑。

您可能需要额外的逻辑来决定是否"慢跑"和"慢跑"是不同的词。您还需要确定标点符号是否计数(如果它不应该删除则删除)。这完全取决于问题陈述的优良要求。

关于您的原始代码,我不确定您在此循环中尝试完成的任务:

for y in l:
    w += y

它简单地将所有单词连接成一个没有空格的单个字符串。因此,如果l为['the','cat','sat','on','the','mat'],则w将为'thecatsatonthemat',这可能会导致您的匹配出现问题。如果原始字符串包含"我可能就是说你是对的",单词"可能是"将连接到#34;也许"并且find会找到它们。

答案 1 :(得分:1)

您可以使用collections.Counter的列表理解来做到这一点。另请注意.lower()将所有字​​词放在小写字母中,以免将Joggingjogging混为两个不同的字词。

from collections import Counter
my_str = 'yo i went Jogging then yo i went jogging tuesday wednesday'
my_list = Counter(my_str.lower().split())
print([element for element in my_list if my_list[element] == 1])

<强>输出:

['wednesday', 'then', 'tuesday']

此外,如果除了大小写之外还需要删除所有标点符号,则可以在计算带有set(string.punctuation)的单词之前排除标点字符,如下所示:

from collections import Counter
import string

my_str = 'yo! i went Jogging then yo i went jogging tuesday, wednesday.'
removed_punct_str = ''.join(chara for chara in my_str if chara not in set(string.punctuation))
my_list = Counter(removed_punct_str.lower().split())
print([element for element in my_list if my_list[element] == 1])

答案 2 :(得分:0)

字符串模块:

使用字符串模块获取标点符号列表并使用我们的常规for循环来替换。演示:

>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>> 
更多pythonic: how to replace punctuation in a string python?


<强> ALGO:

  1. 通过字符串模块从输入文本中删除标点符号。
  2. 转换为小写。
  3. 拆分输入文本并更新词典。
  4. 从字典中迭代项目并更新hapax字词。
  5. 代码:

    import string
    import collections
    
    def hapax(text):
        # Remove Punctuation from the Input text.
        text = text.translate(string.maketrans("",""), string.punctuation)
        print "Debug 1- After remove Punctuation:", text
    
        # ignore:- Lower/upper/mix cases
        text = text.lower()
        print "Debug 2- After converted to Lower case:", text
    
        #- Create Default dictionary. Key is word and value 
        word_count = collections.defaultdict(int)
        print "Debug 3- Collection Default Dictionary:", word_count
    
        #- Split text and update result dictionary.
        for word in text.split():
            if word:#- Ignore whitespace.
                word_count[word] += 1
    
        print "Debug 4- Word and its count:", word_count
    
        #- List which save word which value is 1.
        hapax_words = list()
        for word, value in word_count.items():
            if value==1:
                hapax_words.append(word)
    
        print "Debug 5- Final Hapax words:", hapax_words
    
    
    hapax('yo i went jogging then yo i went jogging tuesday wednesday some punctuation ? I and & ')
    

    输出:

    $ python 2.py 
    Debug 1- After remove Punctuation: yo i went jogging then yo i went jogging tuesday wednesday some punctuation  I and  
    Debug 2- After converted to Lower case: yo i went jogging then yo i went jogging tuesday wednesday some punctuation  i and  
    Debug 3- Collection Default Dictionary: defaultdict(<type 'int'>, {})
    Debug 4- Word and its count: defaultdict(<type 'int'>, {'and': 1, 'then': 1, 'yo': 2, 'i': 3, 'tuesday': 1, 'punctuation': 1, 'some': 1, 'wednesday': 1, 'jogging': 2, 'went': 2})
    Debug 5- Final Hapax words: ['and', 'then', 'tuesday', 'punctuation', 'some', 'wednesday']
    

答案 3 :(得分:0)

Python 3.X代码:

import string

def edit_word(new_str):
    """Remove punctuation"""
    new_str = new_str.lower()
    st_table = new_str.maketrans(string.punctuation, '-'*32)
    new_str = new_str.translate(st_table)
    return new_str.replace('-', '')

st = "String to check for hapax!, try with any string"
w_dict = {}
for w in st.split():
    ew = edit_word(w)
    w_dict[ew] = w_dict.get(ew, 0) + 1

for w, c in w_dict.items():
    if c == 1: print(w)