我有一个数据框,其标题如下所示, 
df.head()
Out[660]:
Samples variable value Type
0 PE01I 267N12.3_Beta 0.066517 Beta
1 PE01R R267N12.3_Beta 0.061617 Beta
2 PE02I 267N12.3_Beta 0.071013 Beta
3 PE02R 267N12.3_Beta 0.056623 Beta
4 PE03I 267N12.3_Beta 0.071633 Beta
5 PE01I 267N12.3_FPKM 0.000000 FPKM
6 PE01R 267N12.3_FPKM 0.003430 FPKM
7 PE02I 267N12.3_FPKM 0.272144 FPKM
8 PE02R 267N12.3_FPKM 0.005753 FPKM
9 PE03I 267N12.3_FPKM 0.078708 FPKM
我希望通过使用列"类型"根据列#34;值"中的相应值,添加标题名称为Beta和FPKM的新列。 到目前为止,我通过跟随单行,
尝试了这个df['Beta'] = df['Type'].map(lambda x: df.value if x == "Beta" else "FPKM")
它给出了输出后的输出,
Samples variable value Type Beta
0 PE01I 267N12.3_Beta 0.066517 Beta 0 0.066517 1 0.061617 2 0.07...
1 PE01R 267N12.3_Beta 0.061617 Beta 0 0.066517 1 0.061617 2 0.07...
2 PE02I 267N12.3_Beta 0.071013 Beta 0 0.066517 1 0.061617 2 0.07...
3 PE02R 267N12.3_Beta 0.056623 Beta 0 0.066517 1 0.061617 2 0.07...
4 PE03I 267N12.3_Beta 0.071633 Beta 0 0.066517 1 0.061617 2 0.07...
Beta列有三个值,所有列都在重复。 我的目标是拥有一个看起来像的数据框,
Samples variable Beta FPKM
PE01I 267N12.3_Beta 0.066517 0
PE01R 267N12.3_Beta 0.061617 0.00343
PE02I 267N12.3_Beta 0.071013 0.272144
PE02R 267N12.3_Beta 0.056623 0.005753
PE03I 267N12.3_Beta 0.071633 0.078708
任何帮助都会很棒.. 谢谢
答案 0 :(得分:1)
我认为你需要unstack
:
df1 = df.set_index(['Samples','Type']).unstack()
print (df1)
variable value
Type Beta FPKM Beta FPKM
Samples
PE01I 267N12.3_Beta 267N12.3_FPKM 0.066517 0.000000
PE01R R267N12.3_Beta 267N12.3_FPKM 0.061617 0.003430
PE02I 267N12.3_Beta 267N12.3_FPKM 0.071013 0.272144
PE02R 267N12.3_Beta 267N12.3_FPKM 0.056623 0.005753
PE03I 267N12.3_Beta 267N12.3_FPKM 0.071633 0.078708
#remove Multiindex in columns
df1.columns = ['_'.join(col) for col in df1.columns]
df1.reset_index(inplace=True)
print (df1)
Samples variable_Beta variable_FPKM value_Beta value_FPKM
0 PE01I 267N12.3_Beta 267N12.3_FPKM 0.066517 0.000000
1 PE01R R267N12.3_Beta 267N12.3_FPKM 0.061617 0.003430
2 PE02I 267N12.3_Beta 267N12.3_FPKM 0.071013 0.272144
3 PE02R 267N12.3_Beta 267N12.3_FPKM 0.056623 0.005753
4 PE03I 267N12.3_Beta 267N12.3_FPKM 0.071633 0.078708
#if need remove column
print (df1.drop('variable_FPKM', axis=1))
Samples variable_Beta value_Beta value_FPKM
0 PE01I 267N12.3_Beta 0.066517 0.000000
1 PE01R R267N12.3_Beta 0.061617 0.003430
2 PE02I 267N12.3_Beta 0.071013 0.272144
3 PE02R 267N12.3_Beta 0.056623 0.005753
4 PE03I 267N12.3_Beta 0.071633 0.078708
通过评论编辑:
如果收到错误:
ValueError:索引包含重复的条目,无法重塑
这意味着您在index
中有重复值,并且必须进行恶化。
您需要pivot_table
,如果aggfunc为np.sum
或np.mean
(使用数字),则省略字符串列,函数''.join
仅适用于字符串值和数字被遗漏。
使用不同的aggfunc
调用两次函数,然后使用concat
:
import pandas as pd
df = pd.DataFrame({'Type': {0: 'Beta', 1: 'Beta', 2: 'Beta', 3: 'Beta', 4: 'Beta', 5: 'FPKM', 6: 'FPKM', 7: 'FPKM', 8: 'FPKM', 9: 'FPKM'}, 'value': {0: 0.066516999999999993, 1: 0.061616999999999998, 2: 0.071012999999999993, 3: 0.056623, 4: 0.071633000000000002, 5: 0.0, 6: 0.0034299999999999999, 7: 0.272144, 8: 0.0057530000000000003, 9: 0.078708}, 'variable': {0: '267N12.3_Beta', 1: 'R267N12.3_Beta', 2: '267N12.3_Beta', 3: '267N12.3_Beta', 4: '267N12.3_Beta', 5: '267N12.3_FPKM', 6: '267N12.3_FPKM', 7: '267N12.3_FPKM', 8: '267N12.3_FPKM', 9: '267N12.3_FPKM'}, 'Samples': {0: 'PE01I', 1: 'PE01I', 2: 'PE02I', 3: 'PE02R', 4: 'PE03I', 5: 'PE01I', 6: 'PE01R', 7: 'PE02I', 8: 'PE02R', 9: 'PE03I'}})
#changed value in second row in column Samples
print (df)
Samples Type value variable
0 PE01I Beta 0.066517 267N12.3_Beta
1 PE01I Beta 0.061617 R267N12.3_Beta
2 PE02I Beta 0.071013 267N12.3_Beta
3 PE02R Beta 0.056623 267N12.3_Beta
4 PE03I Beta 0.071633 267N12.3_Beta
5 PE01I FPKM 0.000000 267N12.3_FPKM
6 PE01R FPKM 0.003430 267N12.3_FPKM
7 PE02I FPKM 0.272144 267N12.3_FPKM
8 PE02R FPKM 0.005753 267N12.3_FPKM
9 PE03I FPKM 0.078708 267N12.3_FPKM
df1 = df.pivot_table(index='Samples', columns=['Type'], aggfunc=','.join)
print (df1)
variable
Type Beta FPKM
Samples
PE01I 267N12.3_Beta,R267N12.3_Beta 267N12.3_FPKM
PE01R None 267N12.3_FPKM
PE02I 267N12.3_Beta 267N12.3_FPKM
PE02R 267N12.3_Beta 267N12.3_FPKM
PE03I 267N12.3_Beta 267N12.3_FPKM
df2 = df.pivot_table(index='Samples', columns=['Type'], aggfunc=np.mean)
print (df2)
value
Type Beta FPKM
Samples
PE01I 0.064067 0.000000
PE01R NaN 0.003430
PE02I 0.071013 0.272144
PE02R 0.056623 0.005753
PE03I 0.071633 0.078708
df3 = pd.concat([df1, df2], axis=1)
df3.columns = ['_'.join(col) for col in df3.columns]
df3.reset_index(inplace=True)
print (df3)
Samples variable_Beta variable_FPKM value_Beta value_FPKM
0 PE01I 267N12.3_Beta,R267N12.3_Beta 267N12.3_FPKM 0.064067 0.000000
1 PE01R None 267N12.3_FPKM NaN 0.003430
2 PE02I 267N12.3_Beta 267N12.3_FPKM 0.071013 0.272144
3 PE02R 267N12.3_Beta 267N12.3_FPKM 0.056623 0.005753
4 PE03I 267N12.3_Beta 267N12.3_FPKM 0.071633 0.078708
答案 1 :(得分:1)
根据Type
列将它们分成2个数据框后,您可以使用merge
。
In [14]: df_1 = df.loc[(df['Type'] == "Beta"), ['Samples', 'variable', 'value']]
In [15]: df_2 = df.loc[(df['Type'] == "FPKM"), ['Samples', 'value']]
In [16]: df_1['Beta'] = df_1['value']
In [17]: df_2['FPKM'] = df_2['value']
In [18]: df_1[['Samples', 'variable', 'Beta']].merge(df_2[['Samples', 'FPKM']], on="Samples")
Out[18]:
Samples variable Beta FPKM
0 PE01I 267N12.3_Beta 0.066517 0.000000
1 PE01R R267N12.3_Beta 0.061617 0.003430
2 PE02I 267N12.3_Beta 0.071013 0.272144
3 PE02R 267N12.3_Beta 0.056623 0.005753
4 PE03I 267N12.3_Beta 0.071633 0.078708