Question

我想计算重复项并将其打印在表格中

但是表在迭代。我该如何解决这项任务

这是我的代码

dummyString = "kamu makan makan jika saya dan dia??"
lists = []

def message(userInput):
    punctuation = "!@#$%^&*()_+<>?:.,;/"
    words = userInput.lower().split()
    conjunction = file.read().split("\n")
    removePunc = [char.strip(punctuation) for char in words if char not in conjunction]
    global lists
    lists = removePunc
    return removePunc

def counting(words):
    already_checked = []
    for char in words:
    # Do not repeat the words
        if char not in already_checked:
        # Check all the indices of the word in the list
            indices = [key for key, value in enumerate(words) if value == char]
            countsDuplicate = len(indices)
            table(lists, countsDuplicate)
        already_checked.append(char)

    return indices

def table(allWords, counts):
    print("Distribusi Frekuensi Kata: ")
    print("-"*70)
    print("{:>0s} {:<15s} {:<15s}".format("No","Kata","Frekuensi"))
    print("-"*70)
    words = set(allWords)
    count = 1
    for word in words:
        print("{:>0s} {:<20s} {:<10s}".format(str(count), word, str(counts)))
        count += 1

我想要这样的输出，但是表重复了很多次

----------------------------------------------------------------------
No Kata            Frekuensi
----------------------------------------------------------------------
1 makan                2
2 dia                  1
3 kamu                 1
4 saya                 1

Answer 1

我所做的是从dummyString中删除标点符号，找到字数并将其显示在数据框中。

以下代码应为您工作：

import string
import pandas as pd
from collections import Counter

dummyString = "kamu makan makan jika saya dan dia??"
dummyString_new=dummyString.translate(str.maketrans('', '', string.punctuation))

words = dummyString_new.split()
wordCount = Counter(words)

df = pd.DataFrame.from_dict(wordCount, orient='index').reset_index()
df.columns=['No Kata','Frekuensi']
df.index += 1                         # to start your index from 1 and not 0.

输出：

df：

    No Kata Frekuensi
1   kamu       1
2   makan      2
3   jika       1
4   saya       1
5   dan        1
6   dia        1

Answer 2

假设您的单词列表已经清理过，例如

words = "kamu makan makan jika saya dan dia??"
punctuation = "!@#$%^&*()_+<>?:.,;/"
for p in punctuation:
    if p in words:
        words = words.replace(p, '', words.count(p))
words = words.split()

您可以将set与.count和sorted结合使用，以降序打印单词和丰度：

w_unq = sorted(((item, words.count(item)) for item in set(words)), key=lambda x: x[1], reverse=True)
print('No.\tWord\tAbundance')
for i, u in enumerate(w_unq):
    print('{}\t{}\t{}'.format(i+1, *u))

给你

No.     Word    Abundance
1       makan   2
2       saya    1
3       dan     1
4       dia     1
5       jika    1
6       kamu    1

打印计数表中的重复项

2 个答案: