我的表格如下:
bash <(curl -Ls http://www.eucalyptus.com/install)
我想将此表格转换为只包含3列的新表格:
name A B C
Tom 1 2 3
Jack 2 5 9
Joe 4 7 1
我现在正在做的是for循环
name letter value
Tom A 1
Tom B 2
Tom C 3
Jack A 2
Jack B 5
Jack C 9
Joe A 4
Joe B 7
Joe C 1
有没有人知道这样做的优雅方式?
谢谢!
答案 0 :(得分:1)
您正在寻找melt
df.melt('name')
Out[5]:
name variable value
0 Tom A 1
1 Jack A 2
2 Joe A 4
3 Tom B 2
4 Jack B 5
5 Joe B 7
6 Tom C 3
7 Jack C 9
8 Joe C 1
让我们来命名变量&#39;带有var_name
参数的列作为OP表示:
df.melt(id_vars='name', var_name='letter')
name letter value
0 Tom A 1
1 Jack A 2
2 Joe A 4
3 Tom B 2
4 Jack B 5
5 Joe B 7
6 Tom C 3
7 Jack C 9
8 Joe C 1
答案 1 :(得分:0)
use stack()
, after setting Name as an index:
In [397]: df.set_index(df.name)[['A','B','C']].stack()
Out[397]:
name
Tom A 1
B 2
C 3
Jack A 2
B 5
C 9
Joe A 4
B 7
C 1
dtype: int64
If you want three data columns, just do :
In [412]: u=df.set_index(df.name)[['A','B','C']].stack().reset_index()
In [413]: u.columns=['name','letter','value']
In [414]: u
Out[414]:
name letter value
0 Tom A 1
1 Tom B 2
2 Tom C 3
3 Jack A 2
4 Jack B 5
5 Jack C 9
6 Joe A 4
7 Joe B 7
8 Joe C 1