我想根据行的字符将位于一列中的数据拆分为两个单独的列。数据如下:
3C-assembly|contig_93
ptg000037l
3C-assembly|contig_94
ptg000039l
3C-assembly|contig_95
ptg000043l
3C-assembly|contig_96
ptg000196l
ptg000060l
3C-assembly|contig_97
ptg000083l
ptg000083l
3C-assembly|contig_98
ptg000117l
ptg000005l
3C-assembly|contig_99
ptg000123l
ptg000123l
ptg0001232
ptg0001233
我需要把所有 3C-assembly|contig_ 放在第一列,所有对应的 ptg000 放在第二列:
3C-assembly|contig_93 ptg000037l
3C-assembly|contig_94 ptg000039l
3C-assembly|contig_95 ptg000043l
3C-assembly|contig_96 ptg000196l
3C-assembly|contig_96 ptg000060l
3C-assembly|contig_97 ptg000083l
3C-assembly|contig_97 ptg000083l
3C-assembly|contig_98 ptg000117l
3C-assembly|contig_98 ptg000005l
3C-assembly|contig_99 ptg000123l
3C-assembly|contig_99 ptg000123l
3C-assembly|contig_99 ptg0001232
3C-assembly|contig_99 ptg0001233
...........
答案 0 :(得分:0)
在蟒蛇中:
#假设数据在 Pandas 数据框中。我刚刚创建了它:
import pandas as pd
a=[
"3C-assembly|contig_93 ptg000037l",
"3C-assembly|contig_94 ptg000039l",
"3C-assembly|contig_95 ptg000043l",
"3C-assembly|contig_96 ptg000196l",
"3C-assembly|contig_96 ptg000060l",
"3C-assembly|contig_97 ptg000083l",
"3C-assembly|contig_97 ptg000083l",
"3C-assembly|contig_98 ptg000117l",
"3C-assembly|contig_98 ptg000005l",
"3C-assembly|contig_99 ptg000123l",
"3C-assembly|contig_99 ptg000123l",
"3C-assembly|contig_99 ptg0001232",
"3C-assembly|contig_99 ptg0001233"]
a=pd.DataFrame(a, columns=["data"])
#Define Function to SPlit and Extract
def ExtractContig(Name):
#Split Based on Space
splitgroup=Name.strip().split(' ')
contigselect = splitgroup[0]
ptgselect=splitgroup[1]
# Split Based on Underscore to get first column
contig = contigselect.strip().split('_')[-1]
#Split Based on "g" of the string ptgxxxxxx
ptg = ptgselect.strip().split('g')[-1]
return contig,ptg
#Function Call and Collect Title for Each rows
a['data'].apply(lambda Name: ExtractContig(Name))
您可以存储和执行进一步分析。这种情况下的输出是:
0 (93, 000037l)
1 (94, 000039l)
2 (95, 000043l)
3 (96, 000196l)
4 (96, 000060l)
5 (97, 000083l)
6 (97, 000083l)
7 (98, 000117l)
8 (98, 000005l)
9 (99, 000123l)
10 (99, 000123l)
11 (99, 0001232)
12 (99, 0001233)
Name: data, dtype: object