我有一个包含列名的数据框,如下所示
数据集1:
df1_columns = [adult adultold old old1 old2 old3 old4 old6]
In dataframe2 i have columns subset of column 1, now I want to add columns which I have in dataframe 1.
Dataframe2:
adult adultold old2 old5
0 0 1 0
1 0 0 0
1 0 0 0
0 0 0 0
1 0 0 0
1 0 0 0
0 0 0 1
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
我希望根据dataframe1的列名执行类似的操作,并使用零将相同的列名添加到数据集2中。
dataframe1.columns = [adult, adultold, old, old1, old2, old3, old4, old6]
dataframe2.columns = [adult, adultold, old2 old5]
if x in dataframe1.columns:
if y in dataframe2.columns:
pass
else:
dataframe2['y'] = (fill with zeros)
输出:
adult adultold old old1 old2 old3 old4 old6
0 0 1 0 0 0 0 0
1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0
我们可以按照df1中的顺序获取新数据帧吗?
答案 0 :(得分:3)
如果输入是Series
的列名列表和新列名称numpy.setdiff1d
,则可以使用assign
:
s = pd.Series(0, index=np.setdiff1d(dataframe1_columns, dataframe2_columns))
print (s)
old2 0
old3 0
old4 0
old6 0
dtype: int64
df = dataframe2.assign(**s)
print (df)
adult adultold old old1 old2 old3 old4 old6
0 0 0 1 0 0 0 0 0
1 1 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
4 1 0 0 0 0 0 0 0
5 1 0 0 0 0 0 0 0
6 0 0 0 1 0 0 0 0
7 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0
答案 1 :(得分:1)
使用 -
list_of_cols_not_in_df2 = df.columns.difference(df2.columns) # @jez thanks for teaching difference
pd.concat([df2, pd.DataFrame(0, df2.index, list_of_cols_not_in_df2 )], axis=1)
使用join而不是concat
df2.join(pd.DataFrame(0, df2.index, list_of_cols_not_in_df2 ))
<强>输出强>
adult adultold old old1 old2 old3 old4 old6
0 0 0 1 0 0 0 0 0
1 1 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
4 1 0 0 0 0 0 0 0
5 1 0 0 0 0 0 0 0
6 0 0 0 1 0 0 0 0
7 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0