我想按时间顺序重新填充df中的行。
原始df:
A B C Year
0 ABC 0 A 1950
1 CDE 1 A 1950
2 XYZ 1 B 1954
3 123 1 C 1954
4 X12 1 B 1956
5 123 1 D 1956
6 124 1 D 1956
所需的df:
A B C Year
0 ABC 0 A 1950
1 CDE 1 A 1950
2 ABC 0 A 1951
3 CDE 1 A 1951
4 ABC 0 A 1952
5 CDE 1 A 1952
6 ABC 0 A 1953
7 CDE 1 A 1953
8 XYZ 1 B 1954
9 123 1 C 1954
10 XYZ 1 B 1955
11 123 1 C 1955
12 X12 1 B 1956
13 123 1 D 1956
14 124 1 D 1956
我尝试将Year列转换为datetime,并使用逐年重采样进行向前填充。 但这不起作用,因为如果按年份重新抽样,则每年仅重新抽样一次。
df.resample('YS').first().ffill().reset_index()
所需的df:
A B C Year
0 ABC 0 A 1950
1 CDE 1 A 1950
2 ABC 0 A 1951
3 CDE 1 A 1951
4 ABC 0 A 1952
5 CDE 1 A 1952
6 ABC 0 A 1953
7 CDE 1 A 1953
8 XYZ 1 B 1954
9 123 1 C 1954
10 XYZ 1 B 1955
11 123 1 C 1955
12 X12 1 B 1956
13 123 1 D 1956
14 124 1 D 1956
答案 0 :(得分:3)
我觉得这是一个unnesting
问题
s=df.astype(str).groupby('Year').agg(list)
s.index=s.index.astype(int)
s1=s.reindex(np.arange(s.index.min(),s.index.max()+1),method='ffill')
yourdf=unnesting(s1,list('ABC')).reset_index()
yourdf
Out[117]:
Year A B C
0 1950 ABC 0 A
1 1950 CDE 1 A
2 1951 ABC 0 A
3 1951 CDE 1 A
4 1952 ABC 0 A
5 1952 CDE 1 A
6 1953 ABC 0 A
7 1953 CDE 1 A
8 1954 XYZ 1 B
9 1954 123 1 C
10 1955 XYZ 1 B
11 1955 123 1 C
12 1956 X12 1 B
13 1956 123 1 D
14 1956 124 1 D
def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
答案 1 :(得分:2)
您可以尝试以下方法:
df_out = df.set_index([pd.to_datetime(df['Year'], format='%Y'),'A','B','C'])\
.unstack([1,2,3]).resample('A').ffill()\
.stack([1,2,3]).reset_index([1,2,3])
df_out = df_out.assign(Year=pd.to_datetime(df_out.index).year).reset_index(drop=True)
df_out
输出:
A B C Year
0 ABC 0 A 1950
1 CDE 1 A 1950
2 ABC 0 A 1951
3 CDE 1 A 1951
4 ABC 0 A 1952
5 CDE 1 A 1952
6 ABC 0 A 1953
7 CDE 1 A 1953
8 123 1 C 1954
9 XYZ 1 B 1954
10 123 1 C 1955
11 XYZ 1 B 1955
12 123 1 D 1956
13 X12 1 B 1956
答案 2 :(得分:0)
我通过旋转和融化采取了不同的方法。 似乎在工作。任何人都看到了问题。?
data = {'year': ['2000', '2000', '2005', '2005', '2007', '2007', '2007', '2009'],
'country':['UK', 'US', 'FR','US','UK','FR','US','UK'],
'sales': [10, 21, 20, 10,12,20, 10,12],
'rep': ['john', 'john', 'claire','claire', 'kyle','kyle','kyle','amy']
}
df=pd.DataFrame(data)
year country sales rep
0 2000 UK 10 john
1 2000 US 21 john
2 2005 FR 20 claire
3 2005 US 10 claire
4 2007 UK 12 kyle
5 2007 FR 20 kyle
6 2007 US 10 kyle
7 2009 UK 12 amy
首先做一个枢轴...
dfp=pd.pivot_table(df,index=['country','rep'],values=['sales'],columns=['year']).fillna(0)
dfp=dfp.xs('sales', axis=1, drop_level=True)
year 2000 2005 2007 2009
country rep
FR claire 0.0 20.0 0.0 0.0
kyle 0.0 0.0 20.0 0.0
UK amy 0.0 0.0 0.0 12.0
john 10.0 0.0 0.0 0.0
kyle 0.0 0.0 12.0 0.0
US claire 0.0 10.0 0.0 0.0
john 21.0 0.0 0.0 0.0
kyle 0.0 0.0 10.0 0.0
然后是一些逻辑,以复制列。
cols=dfp.columns.astype(int).values
dft=dfp.copy()
i=0
for col in cols :
if col != cols[-1]:
for newcol in range(col+1,cols[i+1]):
dft[str(newcol)]=dft[str(col)]
i+=1
year 2000 2005 2007 2009 2001 2002 2003 2004 2006 2008
country rep
FR claire 0.0 20.0 0.0 0.0 0.0 0.0 0.0 0.0 20.0 0.0
kyle 0.0 0.0 20.0 0.0 0.0 0.0 0.0 0.0 0.0 20.0
UK amy 0.0 0.0 0.0 12.0 0.0 0.0 0.0 0.0 0.0 0.0
john 10.0 0.0 0.0 0.0 10.0 10.0 10.0 10.0 0.0 0.0
kyle 0.0 0.0 12.0 0.0 0.0 0.0 0.0 0.0 0.0 12.0
US claire 0.0 10.0 0.0 0.0 0.0 0.0 0.0 0.0 10.0 0.0
john 21.0 0.0 0.0 0.0 21.0 21.0 21.0 21.0 0.0 0.0
kyle 0.0 0.0 10.0 0.0 0.0 0.0 0.0 0.0 0.0 10.0
然后融化了它们,使其恢复为原始格式。
dfm=dft.reset_index()
dfm=dfm.melt(id_vars=['country','rep'],value_vars=dfm.columns.values[2:],var_name='Year',value_name='sales')
dfm=dfm.loc[dfm.sales>0].reset_index(drop='True')
country rep Year sales
0 UK john 2000 10.0
1 US john 2000 21.0
2 FR claire 2005 20.0
3 US claire 2005 10.0
4 FR kyle 2007 20.0
5 UK kyle 2007 12.0
6 US kyle 2007 10.0
7 UK amy 2009 12.0
8 UK john 2001 10.0
9 US john 2001 21.0
10 UK john 2002 10.0
11 US john 2002 21.0
12 UK john 2003 10.0
13 US john 2003 21.0
14 UK john 2004 10.0
15 US john 2004 21.0
16 FR claire 2006 20.0
17 US claire 2006 10.0
18 FR kyle 2008 20.0
19 UK kyle 2008 12.0
20 US kyle 2008 10.0