我想计算重复项并将其打印在表格中
但是表在迭代。我该如何解决这项任务
这是我的代码
dummyString = "kamu makan makan jika saya dan dia??"
lists = []
def message(userInput):
punctuation = "!@#$%^&*()_+<>?:.,;/"
words = userInput.lower().split()
conjunction = file.read().split("\n")
removePunc = [char.strip(punctuation) for char in words if char not in conjunction]
global lists
lists = removePunc
return removePunc
def counting(words):
already_checked = []
for char in words:
# Do not repeat the words
if char not in already_checked:
# Check all the indices of the word in the list
indices = [key for key, value in enumerate(words) if value == char]
countsDuplicate = len(indices)
table(lists, countsDuplicate)
already_checked.append(char)
return indices
def table(allWords, counts):
print("Distribusi Frekuensi Kata: ")
print("-"*70)
print("{:>0s} {:<15s} {:<15s}".format("No","Kata","Frekuensi"))
print("-"*70)
words = set(allWords)
count = 1
for word in words:
print("{:>0s} {:<20s} {:<10s}".format(str(count), word, str(counts)))
count += 1
我想要这样的输出,但是表重复了很多次
----------------------------------------------------------------------
No Kata Frekuensi
----------------------------------------------------------------------
1 makan 2
2 dia 1
3 kamu 1
4 saya 1
答案 0 :(得分:0)
我所做的是从dummyString中删除标点符号,找到字数并将其显示在数据框中。
以下代码应为您工作:
import string
import pandas as pd
from collections import Counter
dummyString = "kamu makan makan jika saya dan dia??"
dummyString_new=dummyString.translate(str.maketrans('', '', string.punctuation))
words = dummyString_new.split()
wordCount = Counter(words)
df = pd.DataFrame.from_dict(wordCount, orient='index').reset_index()
df.columns=['No Kata','Frekuensi']
df.index += 1 # to start your index from 1 and not 0.
输出:
df:
No Kata Frekuensi
1 kamu 1
2 makan 2
3 jika 1
4 saya 1
5 dan 1
6 dia 1
答案 1 :(得分:0)
假设您的单词列表已经清理过,例如
words = "kamu makan makan jika saya dan dia??"
punctuation = "!@#$%^&*()_+<>?:.,;/"
for p in punctuation:
if p in words:
words = words.replace(p, '', words.count(p))
words = words.split()
您可以将set
与.count
和sorted
结合使用,以降序打印单词和丰度:
w_unq = sorted(((item, words.count(item)) for item in set(words)), key=lambda x: x[1], reverse=True)
print('No.\tWord\tAbundance')
for i, u in enumerate(w_unq):
print('{}\t{}\t{}'.format(i+1, *u))
给你
No. Word Abundance
1 makan 2
2 saya 1
3 dan 1
4 dia 1
5 jika 1
6 kamu 1