迭代数据帧

时间:2016-02-04 18:47:57

标签: python pandas

我有一个pandas dataframe df

name  e_count   e_start   e_end

aaaa   3       13,14,15,  18,20,25,

bbbb   2       90,94,      100,102,

字段e_count描述了e_starte_end中的元素数量。我想创建一个添加列e_end-e_start的新数据框。例如

name  e_count   e_start   e_end     e_end-e_start

aaaa   3       13,14,15,  18,20,25,  5,6,10,

bbbb   2       90,94,      100,102,   10,8,

我尝试了以下内容:

df['e_end-e_start'] = ""
new_frame = pd.DataFrame(columns = df.columns)
new_frame['e_end-e_start'] = ""
new_frame_idx = -1
for idx,row in df.iterrows():
            new_frame_idx = new_frame_idx + 1
            new_row = df.ix[idx]
            new_frame = new_frame.append(new_row,ignore_index = True)      
            df.ix[idx,'e_end-e_start'] =df.ix[idx,'e_end']-df.ix[idx,'target_end']
            new_frame.ix[new_frame_idx,'e_end-e_start'] =df.ix[idx,'e_end-e_start'] =df.ix[idx,'e_end']-df.ix[idx,'target_end']
print new_frame 

但是我收到了一个错误。你能帮忙吗?

1 个答案:

答案 0 :(得分:0)

通常,您可以更好地将数据存储为整数 由逗号分隔的数字字符串。一种扁平的长格式,例如

In [73]: df
Out[73]: 
   name  e_start  e_end
0  aaaa       13     18
0  aaaa       14     20
0  aaaa       15     25
1   bbb       90    100
1   bbb       94    102

使计算更容易。以下是将DataFrame转换为的方法 平面格式:

import pandas as pd
df = pd.DataFrame({'e_count': [3, 2],
                   'e_end': ['18,20,25,', '100,102,'],
                   'e_start': ['13,14,15,', '90,94,'],
                   'name': ['aaaa', 'bbb']})

dfs = []
for col in ['e_start', 'e_end']:
    tmp = df[col].str.strip(',').str.split(',').apply(pd.Series)
    tmp = tmp.stack()
    tmp.index = tmp.index.droplevel(-1)
    tmp.name = col
    tmp = tmp.astype(int)
    dfs.append(tmp)

df = pd.concat([df[['name']]]+dfs, axis=1)

然后,要计算差异,您可以使用

df['diff'] = df['e_end'] - df['e_start']

要转换回逗号分隔的字符串,

In [76]: df.groupby('name').agg(lambda x: ','.join(x.astype(str)))
Out[76]: 
       e_start     e_end    diff
name                            
aaaa  13,14,15  18,20,25  5,6,10
bbb      90,94   100,102    10,8