过滤CSV文件中的列并输出图

时间:2019-09-17 22:15:16

标签: python pandas csv matplotlib seaborn

我正尝试像在excel中一样过滤CSV中的列。然后,基于该过滤器,我希望它调用另一个列,并将该列中的数据输出到绘图中。

我尝试自行打印代码并正确打印。我只是不确定语法。当我打印代码时,它表明我可以正确搜索一列

data.head()
print('banana',
      data[('Sub-Dept')] == 'Stow Each') #and data[('Sub-Dept')] == 'Stow Each Nike', 'Each Stow to Prime', 'Each Stow to Prime E', 'Each Stow to Prime W', 'Stow to Prime LeadPA')

但是我不知道如何首先对其进行过滤,然后在其下方进行绘制。我对此很陌生。

我有一列里面有许多不同的可过滤名称。我想在上面叫这些名字。

import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns 

x = []
y = []

data = pd.read_csv(r'C:\Users\rmond\Downloads\PS_csvFile.csv', encoding="ISO-8859-1", skiprows=6)


new_data = data.loc[(data['Sub-Dept'] == 'Stow Each')]
sns.set(style="whitegrid") #this is strictly cosmetic, you can change it any time
ax = sns.countplot(x='U.S. OSHA Recordable?', data=new_data)
plt.bar(x, y, label='Loaded from file!')
plt.ylabel('Quantity of Injuries')
plt.title('Injuries (past 4 weeks)')
plt.show() 

现在,我希望它放出1条有2条的图形。问题:1条显示数量为80,另一条显示数量为20。解决方案:从另一列中过滤出数据后,同一图表内的1栏应该显示21,而另一栏显示7。

制图部分效果很好,因此从Excel中提取数据也是如此。我唯一不知道该怎么做的部分是过滤该列,然后基于该过滤器绘制图形。我不确定代码应该是什么样子以及应该去哪里。请帮助

此处的CSV文件:https://drive.google.com/open?id=1yJ6iQL-bOvGSLAKPcPXqgk9aKLqUEmPK

2 个答案:

答案 0 :(得分:0)

尝试pandas.query()

熊猫query可能有用。

data = pd.read_csv(r'C:\Users\rmond\Downloads\PS_csvFile.csv', encoding="ISO-8859-1", skiprows=6)

new_data = data.query("Sub-Dept == Stow Each or
                       Sub-Dept == RF_Pick")

答案 1 :(得分:0)

我很高兴知道这一点。我在互联网上找不到答案。因此,我希望这对以后的人有所帮助。感谢Datanovice将.loc引入最初的想法。这有助于我进行下一步。其余的答案来自这里:here

对不起,我在代码中留下了评论

import pandas as pd # powerful data visualization library
import numpy as np
import matplotlib.pyplot as plt # allows us to plot things
import csv # allows us to import and use CSV commands which are simple but effective
import seaborn as sns #https://seaborn.pydata.org/generated/seaborn.boxplot.html
# This website saved my life https://www.pythonforengineers.com/introduction-to-pandas/
# use this to check the available styles: plt.style.available

x = []
y = []

data = pd.read_csv(r'C:\Users\rmond\Downloads\PS_csvFile.csv', encoding="ISO-8859-1", skiprows=6, index_col="Sub-Dept") #skiprows allows you to skip the comments on top... & ecoding allows pandas to work on this CSV
new_data = data.loc[["Each Stow to Prime", "Each Stow to Prime E", "Each Stow to Prime W", "Stow Each", "Stow Each Nike", "Stow to Prime LeadPA",]]
sns.set(style="whitegrid") #this is strictly cosmetic, you can change it any time
ax = sns.countplot(x='U.S. OSHA Recordable?', data=new_data) #magic, so seaborn is designed to pull the dats from a URL. But when using pandas and seaborn there is a work around
# the key is that "countplot" literally does the work for you. its awesome
plt.bar(x, y, label='Loaded from file!')
plt.ylabel('Quantity of Injuries')
plt.title('Stow Injuries (past 4 weeks)')
plt.show() # shows the plot to the user