根据条件创建列

时间:2018-05-22 10:16:04

标签: python python-3.x pandas

我有一个包含列名的数据框,如下所示

数据集1:

df1_columns = [adult    adultold    old old1    old2    old3    old4    old6]

In dataframe2 i have columns subset of column 1, now I want to add columns which I have in dataframe 1.

Dataframe2:

adult   adultold    old2    old5
0   0   1   0
1   0   0   0
1   0   0   0
0   0   0   0
1   0   0   0
1   0   0   0
0   0   0   1
0   0   0   0
0   0   0   0
0   0   0   0
0   0   0   0

我希望根据dataframe1的列名执行类似的操作,并使用零将相同的列名添加到数据集2中。

dataframe1.columns = [adult,    adultold,   old,    old1,   old2,   old3,   old4,   old6]

dataframe2.columns = [adult,    adultold,   old2    old5]

if x in dataframe1.columns:
     if y in dataframe2.columns:
           pass
     else:
           dataframe2['y'] = (fill with zeros)

输出:

adult   adultold    old old1    old2    old3    old4    old6
0   0   1   0   0   0   0   0
1   0   0   0   0   0   0   0
1   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0
1   0   0   0   0   0   0   0
1   0   0   0   0   0   0   0
0   0   0   1   0   0   0   0
0   0   0   0   0   0   0   0

我们可以按照df1中的顺序获取新数据帧吗?

2 个答案:

答案 0 :(得分:3)

如果输入是Series的列名列表和新列名称numpy.setdiff1d,则可以使用assign

s = pd.Series(0, index=np.setdiff1d(dataframe1_columns, dataframe2_columns))
print (s)
old2    0
old3    0
old4    0
old6    0
dtype: int64

df = dataframe2.assign(**s)
print (df)
    adult  adultold  old  old1  old2  old3  old4  old6
0       0         0    1     0     0     0     0     0
1       1         0    0     0     0     0     0     0
2       1         0    0     0     0     0     0     0
3       0         0    0     0     0     0     0     0
4       1         0    0     0     0     0     0     0
5       1         0    0     0     0     0     0     0
6       0         0    0     1     0     0     0     0
7       0         0    0     0     0     0     0     0
8       0         0    0     0     0     0     0     0
9       0         0    0     0     0     0     0     0
10      0         0    0     0     0     0     0     0

答案 1 :(得分:1)

使用 -

list_of_cols_not_in_df2 = df.columns.difference(df2.columns) # @jez thanks for teaching difference
pd.concat([df2, pd.DataFrame(0, df2.index, list_of_cols_not_in_df2 )], axis=1)

使用join而不是concat

df2.join(pd.DataFrame(0, df2.index, list_of_cols_not_in_df2 ))

<强>输出

    adult   adultold    old old1    old2    old3    old4    old6
0   0   0   1   0   0   0   0   0
1   1   0   0   0   0   0   0   0
2   1   0   0   0   0   0   0   0
3   0   0   0   0   0   0   0   0
4   1   0   0   0   0   0   0   0
5   1   0   0   0   0   0   0   0
6   0   0   0   1   0   0   0   0
7   0   0   0   0   0   0   0   0
8   0   0   0   0   0   0   0   0
9   0   0   0   0   0   0   0   0
10  0   0   0   0   0   0   0   0