我的数据框如下:
pd.DataFrame([["t1","d2","e3","r4"],
["t1","d2","e2","r4"],
["t1","d2","e1","r4"]],columns=["a","b","c","d"])
我想要:
pd.DataFrame([["t1","d2","e3","r4","e1","e2"]],
columns=["a","b","c","d","c1","c2"])
即我只有1个值不同的列,我想创建一个新的数据框,并在观察到新值时添加列。有一个简单的方法吗?
答案 0 :(得分:7)
Ucols = df.columns[(df.nunique() == 1)].tolist()
df_out = df.set_index(Ucols).set_index(df.groupby(Ucols).cumcount(), append=True).unstack()
df_out.columns = [f'{i}{j}' if j != 0 else f'{i}' for i,j in df_out.columns]
print(df_out.reset_index())
输出:
a b d c c1 c2
0 t1 d2 r4 e3 e2 e1
使用:
df_out = df.set_index(['a','b','d',df.groupby(['a','b','d']).cumcount()]).unstack()
df_out.columns = [f'{i}{j}' if j != 0 else f'{i}' for i,j in df_out.columns]
df_out.reset_index()
输出:
a b d c c1 c2
0 t1 d2 r4 e3 e2 e1
答案 1 :(得分:6)
您可以使用字典理解。为了保持一致,我在所有列上都添加了整数标签。
res = pd.DataFrame({f'{col}{idx}': val for col in df for idx, val in \
enumerate(df[col].unique(), 1)}, index=[0])
print(res)
a1 b1 c1 c2 c3 d1
0 t1 d2 e3 e2 e1 r4
df[col].unique()
的替代方法是df[col].drop_duplicates()
,尽管后者可能会导致pd.Series
与np.ndarray
的迭代对象的开销。
答案 2 :(得分:4)
不如Scott回答的那么漂亮,但是您要寻找的逻辑是:
out = pd.DataFrame()
for col in df.columns:
values =df[col].unique()
if len(values)==1:
out[col]=values
else:
for i,value in enumerate(values):
out[col+str(i+1)]= value
答案 3 :(得分:2)
使用drop_duplicates
s=df.reset_index().melt('index').drop_duplicates(['variable','value'],keep='first')
pd.DataFrame([s.value.values.tolist()],columns=s['variable']+s['index'].astype(str))
Out[1151]:
a0 b0 c0 c1 c2 d0
0 t1 d2 e3 e2 e1 r4