我有一个df,其中有许多列具有相同的列名。我希望使用与列相同的列名来实现SQL中的UNION。
查看示例数据:
cie = ['y','n','y','n']
words = [['bank', 'payment'],['student', 'loan','payment'],['bank', 'payment'],['student', 'loan']]
df = pd.DataFrame(data=words, index=cie)
df:
0 1 2
y bank payment None
n student loan payment
y bank payment None
n student loan Non
df.T:
y n y n
0 bank student bank student
1 payment loan payment loan
2 None payment None None
我需要组合两个y列,因为我想计算y中的单词导致结果确定的次数。 理想情况下,结果应为:
y n
0 bank student
1 payment loan
2 None payment
3 bank student
4 payment loan
5 None None
我尝试了许多方法,但是它们不起作用。有人可以帮忙吗?谢谢。
答案 0 :(得分:1)
IIUC首先melt
,使用cumcount
创建附加密钥,现在您将看到问题变成了pivot
s=df.reset_index().melt('index')
s.variable=s.groupby('index').cumcount()
s.pivot(*s.columns).T
Out[43]:
index n y
variable
0 student bank
1 student bank
2 loan payment
3 loan payment
4 payment None
5 None None
答案 1 :(得分:1)
尝试以下操作,简单地获得两列并合并它们,展平它们,对两列都这样做,然后构造一个新的数据框:
df = pd.DataFrame({'y':np.array(list(zip(*df.T['y'].values.tolist()))).flatten().tolist(),
'n': np.array(list(zip(*df.T['n'].values.tolist()))).flatten().tolist()})
现在:
print(df)
是:
n y
0 student bank
1 loan payment
2 payment None
3 student bank
4 loan payment
5 None None