我正在尝试从单个数据帧(tr)创建基于一组列(cat_col)的多个数据帧。新的数据框名称必须为tr_'colname'。 有人可以帮我下面的代码吗?
Content-Disposition
输出: (3,2) (7,2) (8,2) (5,2) (6,2) (6,2) (18,2) (7,2) (58,2) (4,2) (3,2) (7,2)
for col in cat_col:
tr_ = tr[[col,'TARGET']].groupby([col,'TARGET']).size().reset_index(name='Counts')
tr_ = pivot_table(tr_,values='Counts',index=[col],columns=['TARGET'])
print tr_.shape
col1目标 0无人陪伴1 1家庭0 2无人陪伴0 3无人陪伴0 4无人陪伴0 5配偶,伴侣0 6无人陪伴0 7无人陪伴0 8个孩子0 9无人陪伴0
tr[['col1','TARGET']].head(10)
目标0 1
col1
家庭37140 3009
配偶10475 895
无人陪伴228189 20337
答案 0 :(得分:1)
我认为需要:
tr = pd.DataFrame({'A':list('abcdefabcd'),
'B':list('abcdeabffe'),
'TARGET':[1,1,0,0,1,0,1,1,0,1]})
print (tr)
A B TARGET
0 a a 1
1 b b 1
2 c c 0
3 d d 0
4 e e 1
5 f a 0
6 a b 1
7 b f 1
8 c f 0
9 d e 1
cat_col = ['A','B']
d = {}
for col in cat_col:
tr_ = (tr[[col,'TARGET']].groupby([col,'TARGET'])
.size()
.unstack()
.reset_index()
.rename_axis(None, axis=1))
#some another processes if necessary
#check if outout is DataFrame
print (type(tr_))
print (tr_)
#if necessary store to dict
d[col] = tr_
#select df from dict
print (d['A'])
A 0 1
0 a NaN 2.0
1 b NaN 2.0
2 c 2.0 NaN
3 d 1.0 1.0
4 e NaN 1.0
5 f 1.0 NaN