从具有多个 csv 文件的数据框中绘制图形

时间:2021-07-06 07:27:50

标签: python python-3.x pandas matplotlib

我有 2 个 csv 文件

file1.csv

,DATE,DAY,OPEN,2PM,CLOSE,STATUS
0,2021-05-18,Tuesday,538.8,530.45,530.8,0
1,2021-05-19,Wednesday,530.65,532.6,536.85,0
2,2021-05-20,Thursday,536.95,537.05,536.35,1
3,2021-05-21,Friday,538.0,538.2,537.55,1
4,2021-05-24,Monday,537.3,535.05,532.85,1
5,2021-05-25,Tuesday,535.9,531.35,529.65,1
6,2021-05-26,Wednesday,532.95,530.55,532.1,0
7,2021-05-27,Thursday,532.95,529.65,529.85,0

file2.csv

,DATE,DAY,OPEN,2PM,CLOSE,STATUS
0,2021-05-18,Tuesday,538.8,530.45,530.8,1
1,2021-05-19,Wednesday,530.65,532.6,536.85,0
2,2021-05-20,Thursday,536.95,537.05,536.35,1
3,2021-05-21,Friday,538.0,538.2,537.55,1
4,2021-05-24,Monday,537.3,535.05,532.85,2
5,2021-05-25,Tuesday,535.9,531.35,529.65,1
6,2021-05-26,Wednesday,532.95,530.55,532.1,0
7,2021-05-27,Thursday,532.95,529.65,529.85,0

file3.csv

,DATE,DAY,OPEN,2PM,CLOSE,STATUS
0,2021-05-18,Tuesday,538.8,530.45,530.9,0
1,2021-05-19,Wednesday,530.65,532.6,536.85,1
2,2021-05-20,Thursday,536.95,537.05,536.35,0
3,2021-05-21,Friday,538.0,538.2,537.55,1
4,2021-05-24,Monday,537.3,535.05,532.85,1
5,2021-05-25,Tuesday,535.9,531.35,529.65,0
6,2021-05-26,Wednesday,532.95,530.55,532.1,0
7,2021-05-27,Thursday,532.95,529.65,529.85,1

可以使用

绘制单个 csv 文件的图形
import pandas as pd
df = pd.read_csv("file1.csv")
df.groupby('DAY')['STATUS'].value_counts(normalize=True).unstack().plot.bar()

将绘图显示为

enter image description here

这个图有 5 个 twinBARS(星期一、星期二、星期三等)用于一个文件。

但是,我想在一个图中绘制所有 3 个文件中“星期一”的数据。谁能告诉我如何处理多个文件?

这意味着,情节将有 3 个双杠。每个双杠将代表每个文件中的星期一 例如

Monday from file1.csv
Monday from file2.csv
Monday from file3.csv

我想为所有 3 个文件绘制星期一的这个图。

1 个答案:

答案 0 :(得分:1)

在连接之前为每个 FILE 创建一个 df 列。然后按所需日期(本例中为 Tuesday)过滤并按 DAYFILE 分组:

df1 = pd.read_csv('file1.csv').assign(FILE=1)
df2 = pd.read_csv('file2.csv').assign(FILE=2)
df3 = pd.read_csv('file3.csv').assign(FILE=3)
df = pd.concat([df1, df2, df3]).reset_index(drop=True)

# or concat via generator
# df = pd.concat(pd.read_csv(f'file{i}.csv').assign(FILE=i) for i in (1,2,3).reset_index(drop=True))

(df[df.DAY.eq('Tuesday')]
    .groupby(['DAY', 'FILE'])['STATUS']
    .value_counts(normalize=True)
    .unstack().plot.bar())
plt.xticks(rotation=0)

tuesdays per file


要按给定的 threshold 进行过滤,请将值计数保存到中间 counts df 并使用它进行过滤:

day, threshold = 'Tuesday', 0.8
counts = df[df.DAY.eq(day)].groupby(['DAY', 'FILE'])['STATUS'].value_counts(normalize=True).unstack()
counts[counts > threshold].plot.bar()

tuesdays per file above threshold