使用isin从列表中获取数据帧列

时间:2016-10-03 09:01:31

标签: python list pandas conditional-statements multiple-columns

我有一个数据框df1,我有一个列表,其中包含几列df1的名称。

df1:
User_id  month  day  Age   year    CVI    ZIP    sex  wgt
0           1    7   16    1977     2      NA    M    NaN
1           2    7   16    1977     3      NA    M    NaN
2           3    7   16    1977     2      DM    F    NaN
3           4    7   16    1977     7      DM    M    NaN
4           5    7   16    1977     3      DM    M    NaN
...        ...    ...  ...   ...   ...     ...  ...  ...
35544      35545   12   31  2002    15      AH  NaN  NaN
35545      35546   12   31  2002    15      AH  NaN  NaN
35546      35547   12   31  2002    10      RM    F   14
35547      35548   12   31  2002     7      DO    M   51
35548      35549   12   31  2002     5     NaN  NaN  NaN

 list= [u"User_id", u"day", u"ZIP", u"sex"]

我想创建一个新的数据框df2,其中包含列表中的列,以及包含不在列表中的列的数据框df3

Here我发现我需要这样做:

df2=df1[df1[df1.columns[1]].isin(list)]

但结果我得到了:

Empty DataFrame
Columns: []
Index: []
[0 rows x 9 columns]

我的错误是什么?如何获得所需的结果?为什么" 9列"如果它被认为是4?

4 个答案:

答案 0 :(得分:2)

Index.difference的解决方案:

L = [u"User_id", u"day", u"ZIP", u"sex"]

df2 = df1[L] 
df3 = df1[df1.columns.difference(df2.columns)]
print (df2)
   User_id  day  ZIP sex
0        0    7  NaN   M
1        1    7  NaN   M
2        2    7   DM   F
3        3    7   DM   M
4        4    7   DM   M

print (df3)
   Age  CVI  month  wgt  year
0   16    2      1  NaN  1977
1   16    3      2  NaN  1977
2   16    2      3  NaN  1977
3   16    7      4  NaN  1977
4   16    3      5  NaN  1977

或者:

df2 = df1[L] 
df3 = df1[df1.columns.difference(pd.Index(L))]
print (df2)
   User_id  day  ZIP sex
0        0    7  NaN   M
1        1    7  NaN   M
2        2    7   DM   F
3        3    7   DM   M
4        4    7   DM   M

print (df3)
   Age  CVI  month  wgt  year
0   16    2      1  NaN  1977
1   16    3      2  NaN  1977
2   16    2      3  NaN  1977
3   16    7      4  NaN  1977
4   16    3      5  NaN  1977

答案 1 :(得分:1)

您可以尝试:

df2 = df1[list] # it does a projection on the columns contained in the list
df3 = df1[[col for col in df1.columns if col not in list]]

答案 2 :(得分:1)

永远不要将列表命名为“list”

my_list= [u"User_id", u"day", u"ZIP", u"sex"]
df2 = df1[df1.keys()[df1.keys().isin(my_list)]]

答案 3 :(得分:1)

永远不要将列表命名为" list"

my_list= [u"User_id", u"day", u"ZIP", u"sex"]
df2 = df1[df1.keys()[df1.keys().isin(my_list)]]

df2 = df1[df1.columns[df1.columns.isin(my_list)]]