我有以下数据框:
import pandas as pd
df = pd.DataFrame({'Probe' : ['a', 'b', 'c', 'd','e'],
'Gene' : ['one', 'two','three','four','five'],
'X' : randn(5), 'Y' : randn(5)})
看起来像这样:
In [20]: df
Out[20]:
Gene Probe X Y
0 one a 0.104504 1.089442
1 two b 0.030071 0.696786
2 three c 1.224704 1.077867
3 four d -0.052333 0.034292
4 five e -0.283872 0.602743
我想要做的是将列X
的数据框分开并保留
第一列和第二列产生:
Gene Probe X
0 one a 0.104504
1 two b 0.030071
2 three c 1.224704
3 four d -0.052333
4 five e -0.283872
和
Gene Probe Y
0 one a 1.089442
1 two b 0.696786
2 three c 1.077867
3 four d 0.034292
4 five e 0.602743
我试过这个,但确实给了我的期望:
for dfs in df.groupby(['Probe','Gene']):
print dfs
这样做的正确方法是什么?
答案 0 :(得分:1)
这将是一个开始:
df_x = df.loc[:, ['Gene', 'Probe', 'X']]
df_y = df.loc[:, ['Gene', 'Probe', 'Y']]
答案 1 :(得分:1)
您可以使用difference
删除您不感兴趣的列以选择列:
In [9]:
X = df[df.columns.difference(['Y'])]
Y = df[df.columns.difference(['X'])]
print(X)
Y
Gene Probe X
0 one a 1.231749
1 two b 0.519425
2 three c 0.849960
3 four d -0.077796
4 five e 1.224163
Out[9]:
Gene Probe Y
0 one a 0.022695
1 two b 0.500311
2 three c -0.163624
3 four d 0.411491
4 five e 1.305214