我是编码的初学者,已经制作了一个代码,该代码可以计算单词的出现频率,然后使用panda包放入表格中,但是我需要删除生成的重复项。
我遵循了有关如何删除重复项的在线教程,但是当前的代码仍然无法正常工作,如第二个输入所示。任何反馈,不胜感激。
输入
txt = "chilli mango chilli mango grape"
words = txt.split()
for word in words:
print(word + " " + str(txt.count(word)))
import pandas as pd
mytable = pd.DataFrame()
for word in words:
tempdf = pd.DataFrame({"word" : [word], "frequency" : [txt.count(word)]})
mytable = mytable.append(tempdf)
print(mytable)
输出
chilli 2
mango 2
chilli 2
mango 2
grape 1
word frequency
0 chilli 2
word frequency
0 chilli 2
0 mango 2
word frequency
0 chilli 2
0 mango 2
0 chilli 2
word frequency
0 chilli 2
0 mango 2
0 chilli 2
0 mango 2
word frequency
0 chilli 2
0 mango 2
0 chilli 2
0 mango 2
0 grape 1
输入
data = mytable
data.sort_values("First name", inplace = True)
data.drop_duplicates(subset = "First name",
keep = False, inplace = True)
print(data)
答案 0 :(得分:1)
您可以执行dict
:
dct = {}
for word in txt.split():
if word not in dct:
dct[word] = 1
else:
dct[word] += 1
frequency = pd.Series(dct)
或pandas
方式:
frequency = pd.Series(txt.split()).value_counts()
答案 1 :(得分:0)
collections.Counter
也专为此类任务而设计,可以轻松转换为熊猫数据框。
from collections import Counter
txt = "chilli mango chilli mango grape"
words = txt.split()
counts = Counter(words) # Counter({'chilli': 2, 'grape': 1, 'mango': 2})
df = pd.DataFrame(counts.items(), columns=["Word", "Frequency"]) # same data as a dataframe
您还可以构建这样的数据框,以避免创建重复项:
mytable = pd.DataFrame(columns=["word", "frequency"]).set_index("word")
for word in words:
if word in mytable.index:
mytable.loc[word] += 1
else:
mytable.loc[word] = 1
已经说过,如果您删除keep = False
(告诉它删除所有所有重复项,包括第一个副本)并将"First name"
更改为"word"
,则您现有的代码应该可以正常工作Sample output as follow: (the one with * are the input from user)
Input the number of dice(s): *2
Input the number of faces for the 1st dice: *6
Input the number of faces for the 2nd dice: *6
Probability of 2 = 1/36
Probability of 3 = 2/36
Probability of 4 = 3/36
Probability of 5 = 4/36
Probability of 6 = 5/36
Probability of 7 = 6/36
Probability of 8 = 5/36
Probability of 9 = 4/36
Probability of 10 = 3/36
Probability of 11 = 2/36
Probability of 12 = 1/36
Input the number of dice(s): *5
Input the number of faces for the 1st dice: *1
Input the number of faces for the 2nd dice: *2
Input the number of faces for the 3rd dice: *3
Input the number of faces for the 4th dice: *4
Input the number of faces for the 5th dice: *5
Probability of 5 = 1/120
Probability of 6 = 4/120
Probability of 7 = 9/120
Probability of 8 = 15/120
Probability of 9 = 20/120
Probability of 10 = 22/120
Probability of 11 = 20/120
Probability of 12 = 15/120
Probability of 13 = 9/120
Probability of 14 = 4/120
Probability of 15 = 1/120
。