我有一个数据框df1
,我有一个列表,其中包含几列df1
的名称。
df1:
User_id month day Age year CVI ZIP sex wgt
0 1 7 16 1977 2 NA M NaN
1 2 7 16 1977 3 NA M NaN
2 3 7 16 1977 2 DM F NaN
3 4 7 16 1977 7 DM M NaN
4 5 7 16 1977 3 DM M NaN
... ... ... ... ... ... ... ... ...
35544 35545 12 31 2002 15 AH NaN NaN
35545 35546 12 31 2002 15 AH NaN NaN
35546 35547 12 31 2002 10 RM F 14
35547 35548 12 31 2002 7 DO M 51
35548 35549 12 31 2002 5 NaN NaN NaN
list= [u"User_id", u"day", u"ZIP", u"sex"]
我想创建一个新的数据框df2
,其中包含列表中的列,以及包含不在列表中的列的数据框df3
。
Here我发现我需要这样做:
df2=df1[df1[df1.columns[1]].isin(list)]
但结果我得到了:
Empty DataFrame
Columns: []
Index: []
[0 rows x 9 columns]
我的错误是什么?如何获得所需的结果?为什么" 9列"如果它被认为是4?
答案 0 :(得分:2)
Index.difference
的解决方案:
L = [u"User_id", u"day", u"ZIP", u"sex"]
df2 = df1[L]
df3 = df1[df1.columns.difference(df2.columns)]
print (df2)
User_id day ZIP sex
0 0 7 NaN M
1 1 7 NaN M
2 2 7 DM F
3 3 7 DM M
4 4 7 DM M
print (df3)
Age CVI month wgt year
0 16 2 1 NaN 1977
1 16 3 2 NaN 1977
2 16 2 3 NaN 1977
3 16 7 4 NaN 1977
4 16 3 5 NaN 1977
或者:
df2 = df1[L]
df3 = df1[df1.columns.difference(pd.Index(L))]
print (df2)
User_id day ZIP sex
0 0 7 NaN M
1 1 7 NaN M
2 2 7 DM F
3 3 7 DM M
4 4 7 DM M
print (df3)
Age CVI month wgt year
0 16 2 1 NaN 1977
1 16 3 2 NaN 1977
2 16 2 3 NaN 1977
3 16 7 4 NaN 1977
4 16 3 5 NaN 1977
答案 1 :(得分:1)
您可以尝试:
df2 = df1[list] # it does a projection on the columns contained in the list
df3 = df1[[col for col in df1.columns if col not in list]]
答案 2 :(得分:1)
永远不要将列表命名为“list”
my_list= [u"User_id", u"day", u"ZIP", u"sex"]
df2 = df1[df1.keys()[df1.keys().isin(my_list)]]
答案 3 :(得分:1)
永远不要将列表命名为" list"
my_list= [u"User_id", u"day", u"ZIP", u"sex"]
df2 = df1[df1.keys()[df1.keys().isin(my_list)]]
或
df2 = df1[df1.columns[df1.columns.isin(my_list)]]