转换数据

Question

我的数据框具有这样的两列（日期作为索引）：

我的目标是绘制带有seaborn这样的条形图（使用excel）：

而且我知道我必须使用melt。但是，当我输入以下代码时，结果是索引（日期）消失（由数字替换），并且数据帧结构更改如下：

# pd.melt(df, id_vars=['A'], value_vars=['B'])
premier_melt = pd.melt(final_mada_df,id_vars=["Confirmed"],value_vars = ["Recovered"])

我们如何解决此类问题以正确绘制Seaborn栏

预先感谢

我按照以下建议将代码放在下面：

# main dataframe 
  df2
       Recovered Confirmed
3/20/20   0          3
3/21/20   0          0
3/22/20   0          0
3/23/20   0          9

df2.stack()

出：

3/20/20  Recovered    0
         Confirmed    3
3/21/20  Recovered    0
         Confirmed    0
3/22/20  Recovered    0
                     ..
5/4/20   Confirmed    0
5/5/20   Recovered    2
         Confirmed    2
5/6/20   Recovered    0
         Confirmed    7
Length: 96, dtype: int64

df2.rename(columns={'level_1':'Status',0:'Values'})

出：

       Recovered Confirmed
3/20/20   0         3
3/21/20   0         0
3/22/20   0         0
3/23/20   0         9
3/24/20   0         5

但是当我输入以下代码时，出现错误：

# plot 
ax = sns.barplot(x=df2.index,y='Values',data=df2,hue='Status')

ValueError: Could not interpret input 'Values'

Answer 1

使用`.stack()`，如下所示。

import pandas as pd
import seaborn as sns
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt

# optional graph format parameters
plt.rcParams['figure.figsize'] = (16.0, 10.0)
plt.style.use('ggplot')

# data
np.random.seed(365)
data = {'Confirmed': [np.random.randint(10) for _ in range(25)],
        'date': pd.bdate_range(datetime.today(), freq='d', periods=25).tolist()}

# dataframe
df = pd.DataFrame(data)

# add recovered
df['Recovered'] = df['Confirmed'].div(2)

| date                |   Confirmed |   Recovered |
|:--------------------|------------:|------------:|
| 2020-05-12 00:00:00 |           4 |         2   |
| 2020-05-13 00:00:00 |           1 |         0.5 |
| 2020-05-14 00:00:00 |           5 |         2.5 |
| 2020-05-15 00:00:00 |           1 |         0.5 |
| 2020-05-16 00:00:00 |           9 |         4.5 |

# verify datetime format and set index
df.date = pd.to_datetime(df.date)
df.set_index('date', inplace=True)

转换数据

需要这种转换才能从seaborn获得所需的情节

df1 = df.stack().reset_index().set_index('date').rename(columns={'level_1': 'Status', 0: 'Values'})

| date                | Status    |   Values |
|:--------------------|:----------|---------:|
| 2020-05-23 00:00:00 | Confirmed |        2 |
| 2020-05-23 00:00:00 | Recovered |        1 |
| 2020-05-24 00:00:00 | Confirmed |        4 |
| 2020-05-24 00:00:00 | Recovered |        2 |
| 2020-05-25 00:00:00 | Confirmed |        1 |

季节性剧情

格式化x轴刻度标签需要使用df而不是df1。如上所示，每个日期都会重复，因此df1.index.to_series()将产生一个包含重复日期的列表。

ax = sns.barplot(x=df1.index, y='Values', data=df1, hue='Status')

# format the x-axis tick labels uses df, not df1
ax.xaxis.set_major_formatter(plt.FixedFormatter(df.index.to_series().dt.strftime("%Y-%m-%d")))

# alternative use the following to format the labels
# _, labels = plt.xticks()
# labels = [label.get_text()[:10] for label in labels]
# ax.xaxis.set_major_formatter(plt.FixedFormatter(labels))

plt.xticks(rotation=90)
plt.show()

或者`df.plot.bar()`

产生与上述相同的图，而无需转换为df1
df具有一个日期时间索引，该索引被识别为x轴，所有列均绘制在y轴上。

ax = df.plot.bar()
ax.xaxis.set_major_formatter(plt.FixedFormatter(df.index.to_series().dt.strftime("%Y-%m-%d")))
plt.show()

pandas dataframe：具有多列和日期时间作为索引的seaborn绘图栏

1 个答案:

使用`.stack()`，如下所示。

转换数据

季节性剧情

或者`df.plot.bar()`

pandas dataframe：具有多列和日期时间作为索引的seaborn绘图栏

1 个答案:

使用.stack()，如下所示。

转换数据

季节性剧情

或者df.plot.bar()

使用`.stack()`，如下所示。

或者`df.plot.bar()`