如何从csv文件绘制列的直方图

时间:2020-07-03 11:00:03

标签: python pandas numpy matplotlib histogram

the sample file looks like this x轴的字母范围应为a-z + A-Z,y轴应从内容列中绘制各自的频率

import pandas as pd
import numpy as np
import string
from matplotlib import pyplot as plt
plt.style.use('fivethirtyeight')

col_list = ["tweet_id","sentiment","author","content"]
df = pd.read_csv("sample.csv",usecols=col_list)
freq = (df["content"])

frequencies = {}

for sentence in freq:
    for char in sentence:
        if char in frequencies:
            frequencies[char] += 1
        else:
            frequencies[char] = 1

frequency = str(frequencies)

bins = [chr(i + ord('a')) for i in range(26)].__add__([chr(j + ord('A')) for j in range(26)])


plt.title('data')
plt.xlabel('letters')
plt.ylabel('frequencies')
plt.hist(bins,frequency,edgecolor ='black')
plt.tight_layout()

plt.show()

1 个答案:

答案 0 :(得分:2)

您的代码已经结构良好,我仍然建议在plt.bar上使用xticks并在字母上使用字母,而不是plt.hist,因为在chars上使用else似乎更容易x轴。我评论了a-zA-Z,以便除了期望的字母(sorted)之外,什么都没有添加。还包括一个 tweet_id sentiment author content 0 NaN NaN NaN @tiffanylue i know i was listenin to bad habit... 1 NaN NaN NaN Layin n bed with a headache ughhhh...waitin on... 2 NaN NaN NaN Funeral ceremony...gloomy friday... 3 NaN NaN NaN wants to hang out with friends SOON! 4 NaN NaN NaN @dannycastillo We want to trade with someone w... 5 NaN NaN NaN Re-pinging @ghostridahl4: why didn't you go to... 6 NaN NaN NaN I should be sleep, but im not! thinking about ... ... ... 命令,以使条形按字母顺序或按频率计数排序。

sample.csv

中使用的输入
# populate dictionary a-zA-Z with zeros
frequencies = {}
for i in range(26):
    frequencies[chr(i + ord('a'))] = 0
    frequencies[chr(i + ord('A'))] = 0

# iterate over each row of "content"
for row in df.loc[:,"content"]:
    for char in row:
        if char in frequencies:
            frequencies[char] += 1
        # uncomment to include numbers and symbols (!@#$...)
        # else:
        #     frequencies[char] = 1

# sort items from highest count to lowest
char_freq = sorted(frequencies.items(), key=lambda x: x[1], reverse=True)
# char_freq = sorted(frequencies.items(), key=lambda x: x, reverse=False)

plt.title('data')
plt.xlabel('letters')
plt.ylabel('frequencies')

plt.bar(range(len(char_freq)), [i[1] for i in char_freq], align='center')
plt.xticks(range(len(char_freq)), [i[0] for i in char_freq])

plt.tight_layout()

plt.show()
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.Office.Interop.Word;

namespace OfficeBench
{
    class Program
    {
        static void Main(string[] args)
        {
            Application app = new Application();
            Document doc = app.Documents.Open(@"c:\scratch\so\4pictures.docx");
            if (doc != null)
            {
                Console.WriteLine($"Number in in-line shapes = {doc.InlineShapes.Count}");
                foreach (InlineShape shape in doc.InlineShapes)
                {
                    Console.WriteLine($"Shape (width,height) = ({shape.Width},{shape.Height})");
                    Console.WriteLine($"Shape type = {shape.Type}");
                    Console.WriteLine();
                    if (shape.Type == WdInlineShapeType.wdInlineShapePicture)
                    {
                        // ...
                    }
                }
                doc.Close();
            }
            else
            {
                Console.WriteLine("Error - Unable to open document.");
            }
        }
    }
}

sorted_alphabetically sorted_by_count