使用散点图绘制数据集中的多列

时间:2020-05-18 21:54:50

标签: python pandas dataframe matplotlib plot

import plotly.offline as pyo
import plotly.express as px
import matplotlib.pyplot as pls

pyo.init_notebook_mode()


data = pd.read_csv(r'C:.......Coronovirus Datasets\time_series_covid19_deaths_global.csv')

countries = ['US']
filtered_data = data[data['Country/Region'].isin(countries)]


wanted_values = filtered_data[['Country/Region','1/22/2020','1/23/2020','1/24/2020', '1/25/2020','1/26/2020','1/27/2020','1/28/2020','1/28/2020','1/29/2020',
      '1/30/2020','1/31/2020','2/1/2020','2/2/2020','2/3/2020','2/4/2020','2/5/2020','2/6/2020','2/7/2020','2/8/2020','2/9/2020','2/10/2020',
    '2/11/2020','2/12/2020','2/13/2020','2/14/2020','2/15/2020','2/16/2020','2/17/2020','2/18/2020','2/19/2020','2/20/2020','2/21/2020','2/22/2020','2/23/2020',
    '2/24/2020','2/25/2020','2/26/2020','2/27/2020','2/28/2020','2/29/2020','3/1/2020','3/2/2020','3/3/2020','3/4/2020','3/5/2020','3/6/2020','3/7/2020',
    '3/8/2020','3/9/2020','3/10/2020','3/11/2020','3/12/2020','3/13/2020','3/14/2020','3/15/2020','3/16/2020','3/17/2020','3/18/2020','3/19/2020',
    '3/20/2020','3/21/2020','4/1/2020','4/2/2020','4/3/2020','4/4/2020','4/5/2020','4/6/2020','4/7/2020','4/8/2020','4/9/2020','4/10/2020',
    '4/11/2020','4/12/2020','4/13/2020','4/14/2020','4/15/2020','4/16/2020','4/17/2020','4/18/2020','4/19/2020','4/20/2020','4/21/2020','4/22/2020','4/23/2020',
    '4/24/2020','4/25/2020','4/26/2020','4/27/2020','4/28/2020','4/29/2020','5/1/2020','5/2/2020','5/3/2020','5/4/2020','5/5/2020','5/6/2020','5/7/2020','5/8/2020','5/9/2020']]



fig = px.scatter(wanted_values, x ='Country/Region', y = 'dates' , title = 'Number of Deaths Per Day')
fig.show()


#wanted_values.plot(x="5/9/2020, 5/8/2020", y = 'filtered_data' kind = 'bar')
#pls.show()

如何将所有日期及其对应的死亡绘制为散点图?我计划使用线性回归来预测自1月1日以来的死亡人数。由于我真的是Python的新手,在绘制这些值时遇到了很多麻烦。

数据集可以在这里找到:https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases

1 个答案:

答案 0 :(得分:0)

这是您的数据的样子:

import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("time_series_covid19_deaths_global.csv")
data.iloc[:2,:7]

Province/State  Country/Region  Lat Long    1/22/20 1/23/20 1/24/20
0   NaN Afghanistan 33.0000 65.0000 0   0   0
1   NaN Albania 41.1533 20.1683 0   0   0

首先,通过为日期赋予开始和结束日期(与列名称匹配)并将其融化以提供长格式,将其子集化:

data = data[data['Country/Region']=='US']
data = data.loc[:,'1/22/20':'5/9/20'].melt(var_name="date")
data['date'] = pd.to_datetime(data['date'])

现在看起来像这样:

    date    value
0   2020-01-22  0
1   2020-01-23  0
2   2020-01-24  0

绘图很简单:

data.plot.scatter(x="date",y="value",rot=45)

enter image description here