Question

现在，我的代码从文件（BigramCounter.txt）中获取网页数据，然后查找该文件中的所有文件，以便数据如下所示：

Counter({('the', 'first'): 45, ('on', 'purchases'): 42, ('cash', 'back'): 39})

在此之后，我尝试将其输入到pandas DataFrame中，然后将此df吐出来：

     the     on         cash
     first   purchases  back

 0    45        42       39

这非常接近我的需要，但并不完全。首先，DF不会读取我为列命名的尝试。此外，我希望格式更像这样的东西，其中两个COLUMNS和Words不会在细胞之间分开：

 Words         Frequency
the first        45
on purchases     42
cash back        39

供参考，这是我的代码。我想我可能需要在某处重新排序一个轴，但我不确定如何？有什么想法吗？

import re
from collections import Counter
main_c = Counter()
words = re.findall('\w+', open('BigramCounter.txt', encoding='utf-8').read())
bigrams = Counter(zip(words,words[1:])) 
main_c.update(bigrams) #at this point it looks like Counter({('the', 'first'): 45, etc...})
comm = [[k,v] for k,v in main_c]
frame = pd.DataFrame(comm)
frame.columns = ['Word', 'Frequency']
frame2 = frame.unstack()
frame2.to_csv('text.csv')

Answer 1

我想我明白了你的目标，有很多方法可以实现目标。你真的很亲密。我的第一个倾向是使用一个系列，特别是因为当你写csv时你（大概）正在摆脱df索引，但它并没有产生巨大的差异。

frequencies = [[" ".join(k), v] for k,v in main_c.items()]
pd.DataFrame(frequencies, columns=['Word', 'Frequency'])

           Word  Frequency
0     the first         45
1     cash back         39
2  on purchases         42

如果我怀疑您希望word成为索引，请添加frame.set_index('Word')

         Word  Frequency
    the first         45
    cash back         39
 on purchases         42

将行重新排序到Pandas中的列（Python 3，Pandas）

1 个答案: