我的.csv如下:
Res X XB XC O P
A312 76.55 - - - -
B313 175.4 62.28 32.62 8.189 121.2
J314 176.5 53.34 40.77 8.277 124.6
L315 177.9 55.29 41.44 8.427 125.5
T316 174.7 59.47 63.43 8.264 116.1
...
G378 10.2 58.91 40.13 7.646 126.7
我想像这样重塑它:
312 A X 76.55
313 B X 175.4
313 B XB 62.28
313 B XC 32.62
...
378 G O 7.646
378 G P 126.7
import pandas as pd
df1 = pd.read_csv("my_file.csv", delim_whitespace = True, index_col = False, na_values = "-")
df2 = pd.read_csv("my_file.csv", delim_whitespace = True, index_col = False, na_values = "-")
df1['Pos'] = df1['Res'].str[1:].astype(int)
df1['AA'] = df1['Res'].str[0]
df2.drop('Res', axis = 1, inplace = True)
a = df2.stack(level = -1)
b = df1[["Pos", "AA"]]
print(a)
print(b)
这将产生:
print(a)
的输出:
0 X 76.500
1 X 175.400
XB 62.280
XC 32.620
O 8.189
P 121.200
...
62 X 10.200
XB 58.910
XC 40.130
O 7.646
P 126.700
print(b)
的输出:
0 312 A
1 313 B
2 314 J
3 315 L
...
62 378 G
关于如何进行最后一步的任何构想,即加入这两个df a
和b
,最终实现我想要的格式?我已经尝试了几种pandas
功能,例如pd.merge
,pd.join
和pd.concat
。这些似乎都不起作用...
答案 0 :(得分:1)
您要melt
:
import pandas as pd
df = pd.read_csv("my_file.csv", delim_whitespace=True, index_col=False)
df['Res'] = df['Res'].str[0]
reshaped = df.melt('Res', ['X', 'XB', 'XC', 'O', 'P'])
print(reshaped.dropna().sort_values('Res').reset_index(drop=True))
输出:
Res variable value
0 A X 76.55
1 B O 8.189
2 B P 121.2
3 B X 175.4
4 B XB 62.28
5 B XC 32.62
6 J O 8.277
7 J P 124.6
8 J X 176.5
9 J XB 53.34
10 J XC 40.77
11 L O 8.427
12 L P 125.5
13 L X 177.9
14 L XB 55.29
15 L XC 41.44
16 T O 8.264
17 T P 116.1
18 T X 174.7
19 T XB 59.47
20 T XC 63.43
答案 1 :(得分:1)
您的解决方案有所改变-首先为提取列添加DataFrame.pop
-然后不需要df1.drop('Res', axis = 1, inplace = True)
,然后通过DataFrame.set_index
创建MultiIndex
并调用DataFrame.stack
,最后一次数据清除-reset_index
与rename
:
df1 = pd.read_csv("my_file.csv", delim_whitespace = True, index_col = False, na_values = "-")
df1['Pos'] = df1['Res'].str[1:].astype(int)
df1['AA'] = df1.pop('Res').str[0]
df = (df1.set_index(['Pos', 'AA'])
.stack()
.reset_index(name='new')
.rename(columns={'level_2':'cat'}))
print (df)
Pos AA cat new
0 312 A X 76.550
1 313 B X 175.400
2 313 B XB 62.280
3 313 B XC 32.620
4 313 B O 8.189
5 313 B P 121.200
6 314 J X 176.500
7 314 J XB 53.340
8 314 J XC 40.770
9 314 J O 8.277
10 314 J P 124.600
11 315 L X 177.900
12 315 L XB 55.290
13 315 L XC 41.440
14 315 L O 8.427
15 315 L P 125.500
16 316 T X 174.700
17 316 T XB 59.470
18 316 T XC 63.430
19 316 T O 8.264
20 316 T P 116.100
21 378 G X 10.200
22 378 G XB 58.910
23 378 G XC 40.130
24 378 G O 7.646
25 378 G P 126.700