如何删除某些列并将其写入带有pandas的另一个文件?

时间:2017-07-29 23:42:34

标签: python pandas dictionary

我的文件,peaks_ee是一个文本文件,如下所示:

label dataset sw sf
1H 1H_2
NOESY_F1eF2e.nv
4807.69238281 4803.07373047
600.402832031 600.402832031
1H.L 1H.P 1H.W 1H.B 1H.E 1H.J 1H.U 1H_2.L 1H_2.P 1H_2.W 1H_2.B 1H_2.E 1H_2.J 1H_2.U vol int stat comment flag0 flag8 flag9
0 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
1 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
2 {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
3 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
4 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
5 {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
6 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
7 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
8 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
9 {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0

我的代码应该是第1列,第2列,第8列和第9列,并将它们写入文本文件。但我希望第1列和第8列合并,第2列和第9列合并为一列,然后我希望删除所有重复项。我还想添加第三列并让它输出" 0.03"在每一行。

这是当前的代码:

import pandas as pd

result={}
df = pd.read_csv("peaks_ee.xpk", sep=" ", skiprows=5)

shift1 = df["1H.P"]
shift2 = df["1H_2.P"]

mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))

result = df[mask]
result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]

for col in result.columns:
    if col == ("1H.L") or col==( "1H_2.L"):
         result[col]=result[col].str.strip("{} ")

res = pd.lreshape(df, {'atom_name':['1H.L','1H_2.L'], 'ppm':['1H.P','1H_2.P']}).drop_duplicates()
res['new']=0.3

result.drop_duplicates(keep='first',inplace=True)

tclust_atom=open("tclust_ppm.txt","w+")

res.to_string(tclust_atom, header=False)

tclust_atom.close()

我希望我想要的输出看起来像:

1.H1'  5.82020 0.3
2.H8  7.61004 0.3  
1.H8  8.13712 0.3
2.H1'  5.90291 0.3   
4.H1'  5.74125 0.3   
3.H6  7.53261 0.3
3.H1'  5.54935 0.3   
4.H8  7.49932 0.3
3.H1'  5.54935 0.3  
3.H6  7.53261 0.3 
6.H1'  5.54297 0.3   
5.H6  7.72158 0.3

但目前使用此代码,我的输出是:

0    0.1  ++  {0.0}  {}  0.05  0.1  ++  {0.0}  {}  0.05  {}  0  0  0  100.0  0  0.0   {1.H1'}  5.82020  0.3
1    0.1  ++  {0.0}  {}  0.05  0.1  ++  {0.0}  {}  0.05  {}  0  0  0  100.0  0  0.0    {2.H8}  7.61004  0.3
2    0.1  ++  {0.0}  {}  0.05  0.1  ++  {0.0}  {}  0.05  {}  0  0  0  100.0  0  0.0    {1.H8}  8.13712  0.3
5    0.1  ++  {0.0}  {}  0.05  0.1  ++  {0.0}  {}  0.05  {}  0  0  0  100.0  0  0.0   {2.H1'}  5.90291  0.3
10   0.1  ++  {0.0}  {}  0.05  0.1  ++  {0.0}  {}  0.05  {}  0  0  0  100.0  0  0.0    {3.H6}  7.53261  0.3
11   0.1  ++  {0.0}  {}  0.05  0.1  ++  {0.0}  {}  0.05  {}  0  0  0  100.0  0  0.0   {4.H1'}  5.74125  0.3
12   0.1  ++  {0.0}  {}  0.05  0.1  ++  {0.0}  {}  0.05  {}  0  0  0  100.0  0  0.0   {3.H1'}  5.54935  0.3
13   0.1  ++  {0.0}  {}  0.05  0.1  ++  {0.0}  {}  0.05  {}  0  0  0  100.0  0  0.0    {4.H8}  7.49932  0.3
26   0.1  ++  {0.0}  {}  0.05  0.1  ++  {0.0}  {}  0.05  {}  0  0  0  100.0  0  0.0    {5.H6}  7.72158  0.3
27   0.1  ++  {0.0}  {}  0.05  0.1  ++  {0.0}  {}  0.05  {}  0  0  0  100.0  0  0.0   {6.H1'}  5.54297  0.3

最后三列是我想要的,但是如何摆脱其他列和列中的列:

{1.H1'}  
{2.H8}   
{1.H8}  
{2.H1'}  
{4.H1'}    
{3.H6}  
{3.H1'}    
{4.H8}  
{3.H1'}    
{3.H6}   
{6.H1'}     
{5.H6} 

我怎样才能摆脱花括号?

1 个答案:

答案 0 :(得分:1)

您可以使用此方法

df = pd.read_csv("peaks_ee.xpk", sep=" ", skiprows=5)

#Create two dataframes with desired rows, by column    
df1 = df.copy()[['1H.L','1H.P']]
df2 = df.copy()[['1H_2.L','1H_2.P']]

#retain same names
df2.rename(columns={'1H_2.L' : '1H.L', '1H_2.P' : '1H.P'},inplace=True)

#stack dataframes
df = pd.concat([df1,df2])

# Conditionally delete
df = df[(df['1H.P'] <= 6) & (df['1H.P'] >= 5)]

#Remove Curly Braces
df['1H.L'] = df['1H.L'].apply(lambda row: row.strip('{}'))

#Add column of 0.3
df['new'] = 0.3

#Drop duplicates
df.drop_duplicates(keep='first',inplace=True)

希望这有帮助