我有一个数据框,其中的每一行都由{城市名称,市长名称,市长术语}唯一标识。例如:
city mayor month-year start month-year end
0 New York A 2000.01 2001.01
1 New York B 2001.07 2003.05
2 New York C 2003.12 2004.10
3 Seattle D 2000.02 2002.03
4 Seattle E 2002.03 2005.09
5 Seattle A 2005.10 2006.12
由
生成d = {'city': ['New York', 'New York', 'New York', 'Seattle', 'Seattle', 'Seattle'], 'mayor': ['A','B', 'C', 'D', 'E', 'A'], 'month-year start': ['2000.01', '2001.07', '2003.12', '2000.02', '2002.03', '2005.10'], 'month-year end': ['2001.01', '2003.05', '2004.10', '2002.03', '2005.09', '2006.12']}
df = pd.DataFrame(data=d)
我想将此数据集转换为每个月显示每个城市的市长:
city year-month mayor
New York 2000.01 A
New York 2000.02 A
...
New York 2001.01 A
New York 2001.02 Empty
New York 2001.03 Empty
...
New York 2001.06 Empty
New York 2001.07 B
New York 2001.08 B
...
问题1。我知道如何在两个数字之间爆炸,但是有没有办法在month-year start
和month-year end
的两列之间爆炸?
问题2。由于两位市长之间可能存在空白,爆炸后我该如何填补空白?
谢谢!
答案 0 :(得分:1)
首先,我们使用melt
取消将年始和年终设置为一列。然后我们将city, mayor
和resample
分组为每天:
df = (
df.melt(id_vars=['city', 'mayor'])
.sort_values(['city', 'mayor'])
.drop(columns='variable')
.reset_index(drop=True)
)
df = df.groupby(['city', 'mayor']).resample('d', on='value').first()
df = df.drop(columns=df.columns).reset_index()
city mayor value
0 New York A 2000-01-01
1 New York A 2000-01-02
2 New York A 2000-01-03
3 New York A 2000-01-04
4 New York A 2000-01-05
... ... ... ...
3806 Seattle E 2005-08-28
3807 Seattle E 2005-08-29
3808 Seattle E 2005-08-30
3809 Seattle E 2005-08-31
3810 Seattle E 2005-09-01
[3811 rows x 3 columns]