我的数据帧如下所示
pd.DataFrame({"B.count": [0, 0, 1, 0, 0],
"B.score": [0, 0, 87,0 ,0],
"C.count": [0, 1, 0, 1, 0],
"C.score": [0, 91, 0, 14, 0],
"D.count": [1, 0, 10, 0, 11],
"D.score": [93, 0, 3, 0, 4]},
index = [1,2,3,4,5])
,我想将其转换为长密度稀疏格式。
pd.DataFrame({"id": [1, 2, 3, 3, 4, 5],
"taste": ["D", "C", "B", "D", "C", "D"],
"count": [1, 1, 1, 10, 1, 11],
"score": [93, 91, 87, 3, 14, 4]})
解决方案似乎必须通过wide_to_long
函数,但是很遗憾,我无法使其正常工作。
答案 0 :(得分:0)
让我们构建一个自定义wide_to_long
:
# mask the 0 with nan, stack to get rid of the nan's
s = df.where(df>0).stack().reset_index()
# output dataframe
(pd.concat((s.rename(columns={'level_0':'id'}),
s.level_1.str.extract('(?P<taste>.+)\.(?P<type>count|score)$')
), axis=1
)
.pivot_table(index=['id','taste'], columns='type',values=0 )
.reset_index()
)
输出:
type id taste count score
0 1 D 1.0 93.0
1 2 C 1.0 91.0
2 3 B 1.0 87.0
3 3 D 10.0 3.0
4 4 C 1.0 14.0
5 5 D 11.0 4.0
答案 1 :(得分:0)
是的,您可以使用熊猫wide to long;但是,您必须对列的顺序进行一些调整:
#made a change to the positioning of values in the columns
#get the words after the dot to come before words before the dot
#makes it easier to use pandas wide to long
df.columns = [F'{i[2:]}.{i[0]}' for i in df.columns]
#create id column
df = df.assign(id=df.index)
#convert from wide to long
(pd.wide_to_long(df,
stubnames=['count','score'],
sep='.',
i='id',
j='taste',
suffix='[A-Z]')
#remove 0 values
.query('count != 0')
.sort_index()
.reset_index()
)
id taste count score
0 1 D 1 93
1 2 C 1 91
2 3 B 1 87
3 3 D 10 3
4 4 C 1 14
5 5 D 11 4