我正在尝试一些文本分析和编写代码来显示给定数据集中每个单词每月的出现情况。我有以下功能,每个月输出给定单词的频率-但是我正在努力将其转换为数据帧(列;月份,单词频率)。
感谢任何帮助!
myTable = Table("field1", "field2", "field3")
mytable.insert("foo", "bar", "baz")
select = mytable.select("field1")
---------------
print(select)
>>> ["foo"]
当前输出:
import collections
df=df.set_index(df['Date'])
for u,v in df.groupby(pd.Grouper(freq="M")):
words=sum(v['Processed'].str.split(' ').values.tolist(),[])
c = collections.Counter(words)
print (c['word'])
答案 0 :(得分:0)
您可以使用pd.DataFrame.from_dict
将集合转换为数据框:
import collections
import pandas as pd
df=df.set_index(df['Date'])
results = []
for u,v in df.groupby(pd.Grouper(freq="M")):
words=sum(v['Processed'].str.split(' ').values.tolist(),[])
c = collections.Counter(words)
# convert counter to dataframe
cdf = pd.DataFrame.from_dict(c,orient='index',columns=['frequency']).reset_index()
# add identifer to dataframe
cdf['month'] = u
# collect results
results += [cdf]
# concatenate results
results = pd.concat(results)