Question

作为一种练习，我正在尝试绘制出色的COVID-19 data provided by Johns Hopkins CSSE。我很困惑，因为时间序列是按列组织的（每一天都放在另一侧...请参见下图）。优选地，我希望避免将列转换为行，反之亦然。我的意图是将COVID-19的演变逐日绘制为所有国家/地区的线条（是的，它会变得混乱）。

我当时想我可以使用for循环遍历各列来填充列表并将其用作我的y轴，但是我们是否有一种更“直接”的方式来获得该图？最近，我更多地使用了Plotly，但我也可以使用matplotlib或seaborn。

Answer 1

我认为这个特定的数据集不太适合plotly.express首选的长数据格式。特别是由于Province / State缺少许多观察结果。并且由于您的意图是

将COVID-19的变化绘制为所有国家/地区的线条

...不需要Province / State，Lat或Lon。因此，我只需要汇总每个国家/地区的数据，然后对每个国家/地区使用go.Scatter跟踪。不，它不会太混乱，因为您可以轻松地选择迹线或将注意力集中在字符的不同部分，因为我们在这里运用了plotly的强大功能。无论如何，我希望设置能够满足您的喜好。不要犹豫，让我知道您是否还有其他需要。

情节：

放大后的图：

编辑-版本2：从首次出现起就按天开发

一种使情节少一点混乱的方法是测量每个区域从第一次出现的第一天开始的发展情况，如下所示：

为了生成第一个图，只需复制链接中的数据并将其作为covid.csv存储在名为c:\data的文件夹中。

第一个绘图的完整代码：

import os
import pandas as pd
import plotly.graph_objects as go

dfi = pd.read_csv(r'C:\data\covid.csv',sep = ",", header = 0)

# drop province, latitude and longitude
df = dfi.drop(['Province/State', 'Lat', 'Long'], axis = 1)

# group by countries
df_gr = df.groupby('Country/Region').sum()#.reset_index()

time = df_gr.columns.tolist()
df_gr.columns = pd.to_datetime(time)
df_gr.reset_index(inplace = True)

# transpose df to get dates as a row index
df = df_gr.T

# set first row as header
new_header = df.iloc[0] #grab the first row for the header
df = df[1:] #take the data less the header row
df.columns = new_header #set the header row as the df header

# order df columns descending by country with most cases
df_current = df.iloc[-1].to_frame().reset_index()
df_sort = df_current.sort_values(df_current.columns[-1], ascending = False)# plotly setup
order =  df_sort['Country/Region'].tolist()
df = df[order]

fig = go.Figure()

# add trace for each country
for col in df.columns:
    #print(col)
    fig.add_trace(go.Scatter(x=df.index, y=df[col].values, name=col))
fig.show()

最后一个绘图的代码：

这是基于代码片段1的df构建的。

# replace leading zeros with nans
df2= df.replace({'0':np.nan, 0:np.nan})

# shift leading nans, leaving
# nans in the last rows for some
# regions
df2=df2.apply(lambda x: x.shift(-x.isna().sum()))
df2.reset_index(inplace=True)
df2=df2.drop('index', axis = 1)

fig2 = go.Figure()

# add trace for each country
for col in df2.columns:
    fig2.add_trace(go.Scatter(x=df2.index, y=df2[col].values
                              , name=col
                             ))
fig2.update_layout(showlegend=True)
fig2.update_layout(xaxis=dict(title='Days from first occurence'))
fig2.show()

Answer 2

plotly处理整洁的数据，这需要您将日期转换为单列。我将使用pandasmelt将日期列转换为单列，然后进行绘图。根据我在plotly方面的经验，最好了解plotly如何喜欢结构化的数据（整洁的数据框），并将我的数据集转换为这种形式，而不是尝试以其他方式创建数据集。

我认为，如果您的数据与图片所示的一样简单，则以下内容将使它成为正确的格式：

pd.melt(df, id_vars=['Country/Region'])

有关{@ 3}}上的数据如何喜好的更多信息

更多有关熊猫的信息在这里https://plotly.com/python/px-arguments/

使用python（matplotlib，seaborn或plotly）将全球COVID-19演变绘制成线

2 个答案: