我有两个数据框 df1 和 df2。
这是df1
Name Sector
0 Company1 3D
1 Company2 Accounting
2 Company3 Wireless
这是df2
Name Automotive&Sports Cleantech Entertainment Health Manufacturing Finance
0 3D 0 0 0 0 1 0
1 wireless 0 0 1 0 0 0
2 Accounting 0 0 0 0 0 1
基于 df1['sector'] 中列的值,我想获取 df2 中行值为 1 的列的名称。
答案 0 :(得分:1)
您在 df2
中拥有的称为 one-hot 编码,一种反转该编码的方法是使用 idxmax
。让我们添加一列:
df2['result'] = df2.iloc[:, 1:].idxmax(1)
现在你只需要合并并做一些清理:
df = pd.merge(df1, df2[['Name','result']], left_on='Sector', right_on='Name')
df = df.drop('Name_y', 1).rename(columns={'Name_x': 'Name'})
你会得到想要的输出:
In [102]: df
Out[102]:
Name Sector result
0 Company1 3D Manufacturing
1 Company2 Accounting Finance
2 Company3 Wireless Entertainment
答案 1 :(得分:0)
试试下面的代码,我猜这就是你要找的,
df3 = df1.join(df2, df1.Sector == df2.Name, 'inner') \
.drop(df2.Name)
for col_name in df3.columns:
if (df3.filter(col(col_name) == 0).count() == df3.select(col(col_name)).count()):
df3 = df3.drop(col_name)
df3.show()
+--------+----------+-------------+-------------+-------+
| Name| Sector|Entertainment|Manufacturing|Finance|
+--------+----------+-------------+-------------+-------+
|Company3| Wireless| 1| 0| 0|
|Company2|Accounting| 0| 0| 1|
|Company1| 3D| 0| 1| 0|
+--------+----------+-------------+-------------+-------+