我试图拆开两列:
cols = res.columns[:31]
res[cols] = res[cols].ffill()
res = res.set_index(cols + [31])[32].unstack().reset_index().rename_axis(None, 1)
但是我遇到一个错误:
TypeError: can only perform ops with scalar values
我应该怎么避免呢?
我原来的问题:LINK
答案 0 :(得分:1)
我认为需要将列转换为列表:
cols = res.columns[:31].tolist()
编辑:
索引包含重复的条目,无法调整
意味着重复,这里是前6列,因此不可能创建新的DataFrame,因为前6列创建新索引,而7列创建新列,而8列是2个值:
0 1 2 3 4 5 6 7
0 xx s 1 d f df f 54
1 xx s 1 d f df f g4
新数据框:
index = xx s 1 d f df
column = f
value = 54
index = xx s 1 d f df
column = f
value = g4
所以解决方案是聚合的,这里使用字符串,因此需要.apply(', '.join)
:
index = xx s 1 d f df
column = f
value = 54, g4
或删除重复项,并通过drop_duplicates
保留重复行的第一个或最后一个值:
index = xx s 1 d f df
column = f
value = 54
index = xx s 1 d f df
column = f
value = g4
res = pd.DataFrame({0: ['xx',np.nan,np.nan,np.nan,'ds', np.nan, np.nan, np.nan, np.nan, 'as'],
1: ['s',np.nan,np.nan,np.nan,'a', np.nan, np.nan, np.nan, np.nan, 't'],
2: ['1',np.nan,np.nan,np.nan,'s', np.nan, np.nan, np.nan, np.nan, 'r'],
3: ['d',np.nan, np.nan, np.nan,'d', np.nan, np.nan, np.nan, np.nan, 'a'],
4: ['f',np.nan, np.nan, np.nan,'f', np.nan, np.nan, np.nan, np.nan, '2'],
5: ['df',np.nan,np.nan,np.nan,'ds',np.nan, np.nan, np.nan, np.nan, 'ds'],
6: ['f','f', 'x', 'r', 'f', 'd', 's', '1', '3', 'k'],
7: ['54','g4', 'r4', '43', '64', '43', 'se', 'gf', 's3', 's4']})
cols = res.columns[:6].tolist()
res[cols] = res[cols].ffill()
print (res)
0 1 2 3 4 5 6 7
0 xx s 1 d f df f 54
1 xx s 1 d f df f g4
2 xx s 1 d f df x r4
3 xx s 1 d f df r 43
4 ds a s d f ds f 64
5 ds a s d f ds d 43
6 ds a s d f ds s se
7 ds a s d f ds 1 gf
8 ds a s d f ds 3 s3
9 as t r a 2 ds k s4
res =res.groupby(cols + [6])[7].apply(', '.join).unstack().reset_index().rename_axis(None, 1)
print (res)
0 1 2 3 4 5 1 3 d f k r s x
0 as t r a 2 ds NaN NaN NaN NaN s4 NaN NaN NaN
1 ds a s d f ds gf s3 43 64 NaN NaN se NaN
2 xx s 1 d f df NaN NaN NaN 54, g4 NaN 43 NaN r4 <-54, g4
另一种解决方案是删除重复项:
res = res.drop_duplicates(cols + [6])
res = res.set_index(cols + [6])[7].unstack().reset_index().rename_axis(None, 1)
print (res)
0 1 2 3 4 5 1 3 d f k r s x
0 as t r a 2 ds NaN NaN NaN NaN s4 NaN NaN NaN
1 ds a s d f ds gf s3 43 64 NaN NaN se NaN
2 xx s 1 d f df NaN NaN NaN 54 NaN 43 NaN r4 <- 54
res = res.drop_duplicates(cols + [6], keep='last')
res = res.set_index(cols + [6])[7].unstack().reset_index().rename_axis(None, 1)
print (res)
0 1 2 3 4 5 1 3 d f k r s x
0 as t r a 2 ds NaN NaN NaN NaN s4 NaN NaN NaN
1 ds a s d f ds gf s3 43 64 NaN NaN se NaN
2 xx s 1 d f df NaN NaN NaN g4 NaN 43 NaN r4 <- g4