根据其他列值在熊猫中添加新列

时间:2020-09-20 14:43:10

标签: python python-3.x pandas

您好,我需要帮助才能在数据框中添加两个新列,例如:

Name           start1  end1
OK0100087.1_0  0      375
OK0100087.1_1  376    750
OK0100087.1_2  751    1000
OK0100088.1    0      87766  
OK0100089.1    0      66778
OK0100090.1_0  0      47519
OK0100090.1_1  47520  73733

,想法是添加start2end2,例如:

Name           start1 end1  start2 end2 
OK0100087.1_0  0      375   1000   625 
OK0100087.1_1  376    750   624    250
OK0100087.1_2  751    1000  249    0
OK0100088.1    0      87766 87766  0      
OK0100089.1    0      66778 66778  0
OK0100090.1_0  0      47519 73733  26214
OK0100090.1_1  47520  73733 26213  0

因此,找到start2end2新值的想法是在每个Name content_number

例如OK0100087.1

Name           start1 end1  start2 end2 
OK0100087.1_0  0      375    
OK0100087.1_1  376    750   
OK0100087.1_2  751    1000 

采用最高值= 1000

然后第一个start2将是 1000。

Name           start1 end1  start2 end2 
OK0100087.1_0  0      375   1000   
OK0100087.1_1  376    750   
OK0100087.1_2  751    1000  

那么第一个end2将是= start2-(end1-start1),因此 1000-(375-0)= 625

Name           start1 end1  start2 end2 
OK0100087.1_0  0      375   1000   625 
OK0100087.1_1  376    750   
OK0100087.1_2  751    1000  

然后第二个start2将是 end2-1(625-1)= 624

Name           start1 end1  start2 end2 
OK0100087.1_0  0      375   1000   625 
OK0100087.1_1  376    750   624   

然后再次end2将是 start2-(end1-start1),因此 624-(750-376)= 250

Name           start1 end1  start2 end2 
OK0100087.1_0  0      375   1000   625 
OK0100087.1_1  376    750   624    250 

最后我们应该得到:

Name           start1 end1  start2 end2 
OK0100087.1_0  0      375   1000   625 
OK0100087.1_1  376    750   624    250
OK0100087.1_2  751    1000  249    0
OK0100088.1    0      87766 87766  0      
OK0100089.1    0      66778 66778  0
OK0100090.1_0  0      47519 73733  26214
OK0100090.1_1  47520  73733 26213  0

有人这样做有想法吗?非常感谢您的帮助

2 个答案:

答案 0 :(得分:3)

这只是groupby().transform(),因为您可以提取唯一的名称:

total = df.groupby(df.Name.str.extract('^([^\.]+)')[0])['end1'].transform('max')

df['start2'] = total - df['start1']

df['end2'] = total - df['end1']

输出:

            Name  start1   end1  start2   end2
0  OK0100087.1_0       0    375    1000    625
1  OK0100087.1_1     376    750     624    250
2  OK0100087.1_2     751   1000     249      0
3    OK0100088.1       0  87766   87766      0
4    OK0100089.1       0  66778   66778      0
5  OK0100090.1_0       0  47519   73733  26214
6  OK0100090.1_1   47520  73733   26213      0

答案 1 :(得分:1)

npx babel src --out-dir lib

输出

df = pd.DataFrame({'Name': ['OK0100087.1_0',
  'OK0100087.1_1',
  'OK0100087.1_2',
  'OK0100088.1',
  'OK0100089.1',
  'OK0100090.1_0',
  'OK0100090.1_1'],
 'start1': [0, 376, 751, 0, 0, 0, 47520],
 'end1': [375, 750, 1000, 87766, 66778, 47519, 73733]})


df['base'] = df['Name'].apply(lambda x: x.split('_')[0])
df['start2'] = df.groupby('base')['end1'].transform('max')

output = pd.DataFrame(columns = df.columns)
for index, group in df.groupby('base'):
    t = group.copy()
    for x in range(len(group)):
        
        t['end2'] = t['start2'] - (t['end1'] - t['start1'])
        t['start2'].update((t['end2'] - 1).shift(1))
    output = output.append(t)
    
    
output.drop(columns='base', inplace=True)

output['end2'] = output['end2'].astype(int)