根据迭代从单个数据帧创建多个熊猫数据帧

时间:2018-07-03 06:01:28

标签: python-2.7 pandas iteration

我正在尝试从单个数据帧(tr)创建基于一组列(cat_col)的多个数据帧。新的数据框名称必须为tr_'colname'。 有人可以帮我下面的代码吗?

Content-Disposition

输出: (3,2) (7,2) (8,2) (5,2) (6,2) (6,2) (18,2) (7,2) (58,2) (4,2) (3,2) (7,2)

for col in cat_col:
    tr_ = tr[[col,'TARGET']].groupby([col,'TARGET']).size().reset_index(name='Counts')
    tr_ = pivot_table(tr_,values='Counts',index=[col],columns=['TARGET'])
    print tr_.shape

col1目标 0无人陪伴1 1家庭0 2无人陪伴0 3无人陪伴0 4无人陪伴0 5配偶,伴侣0 6无人陪伴0 7无人陪伴0 8个孩子0 9无人陪伴0

tr[['col1','TARGET']].head(10)

目标0 1 col1
家庭37140 3009 配偶10475 895 无人陪伴228189 20337

1 个答案:

答案 0 :(得分:1)

我认为需要:

tr = pd.DataFrame({'A':list('abcdefabcd'),
                   'B':list('abcdeabffe'),
                   'TARGET':[1,1,0,0,1,0,1,1,0,1]})

print (tr)
   A  B  TARGET
0  a  a       1
1  b  b       1
2  c  c       0
3  d  d       0
4  e  e       1
5  f  a       0
6  a  b       1
7  b  f       1
8  c  f       0
9  d  e       1

cat_col = ['A','B']

d = {}
for col in cat_col:
    tr_ = (tr[[col,'TARGET']].groupby([col,'TARGET'])
                            .size()
                            .unstack()
                            .reset_index()
                            .rename_axis(None, axis=1))
    #some another processes if necessary

    #check if outout is DataFrame  
    print (type(tr_))

    print (tr_)
    #if necessary store to dict
    d[col] = tr_

#select df from dict
print (d['A'])
   A    0    1
0  a  NaN  2.0
1  b  NaN  2.0
2  c  2.0  NaN
3  d  1.0  1.0
4  e  NaN  1.0
5  f  1.0  NaN