熊猫:按年月爆炸,然后填补空白

时间:2020-10-14 20:05:30

标签: python pandas

我有一个数据框,其中的每一行都由{城市名称,市长名称,市长术语}唯一标识。例如:

    city       mayor    month-year start    month-year end
0   New York    A          2000.01              2001.01
1   New York    B          2001.07              2003.05
2   New York    C          2003.12              2004.10
3   Seattle     D          2000.02              2002.03
4   Seattle     E          2002.03              2005.09
5   Seattle     A          2005.10              2006.12

生成
d = {'city': ['New York', 'New York', 'New York', 'Seattle', 'Seattle', 'Seattle'], 'mayor': ['A','B', 'C', 'D', 'E', 'A'], 'month-year start': ['2000.01', '2001.07', '2003.12', '2000.02', '2002.03', '2005.10'], 'month-year end': ['2001.01', '2003.05', '2004.10', '2002.03', '2005.09', '2006.12']}
df = pd.DataFrame(data=d)

我想将此数据集转换为每个月显示每个城市的市长:

city         year-month     mayor
New York       2000.01       A
New York       2000.02       A
...
New York       2001.01       A
New York       2001.02       Empty
New York       2001.03       Empty
...
New York       2001.06       Empty
New York       2001.07       B
New York       2001.08       B
...

问题1。我知道如何在两个数字之间爆炸,但是有没有办法在month-year startmonth-year end的两列之间爆炸?

问题2。由于两位市长之间可能存在空白,爆炸后我该如何填补空白?

谢谢!

1 个答案:

答案 0 :(得分:1)

首先,我们使用melt取消将年始和年终设置为一列。然后我们将city, mayorresample分组为每天:

df = (
    df.melt(id_vars=['city', 'mayor'])
    .sort_values(['city', 'mayor'])
    .drop(columns='variable')
    .reset_index(drop=True)
)
df = df.groupby(['city', 'mayor']).resample('d', on='value').first()
df = df.drop(columns=df.columns).reset_index()
          city mayor      value
0     New York     A 2000-01-01
1     New York     A 2000-01-02
2     New York     A 2000-01-03
3     New York     A 2000-01-04
4     New York     A 2000-01-05
...        ...   ...        ...
3806   Seattle     E 2005-08-28
3807   Seattle     E 2005-08-29
3808   Seattle     E 2005-08-30
3809   Seattle     E 2005-08-31
3810   Seattle     E 2005-09-01

[3811 rows x 3 columns]