我在spark中有三个数据帧,并希望根据某些条件从一个数据帧中提取值到另一个数据帧。以下是我的情景。有人可以帮助我吗?
DF1:
<style type="text/css">
.tg {border-collapse:collapse;border-spacing:0;border-color:#aaa;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:0px;overflow:hidden;word-break:normal;border-color:#aaa;color:#333;background-color:#fff;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:0px;overflow:hidden;word-break:normal;border-color:#aaa;color:#fff;background-color:#f38630;}
.tg .tg-j2zy{background-color:#FCFBE3;vertical-align:top}
.tg .tg-baqh{text-align:center;vertical-align:top}
.tg .tg-yw4l{vertical-align:top}
.tg .tg-yq6s{background-color:#FCFBE3;text-align:center;vertical-align:top}
</style>
<table class="tg">
<tr>
<th class="tg-baqh">person_id</th>
<th class="tg-yw4l">criterion_name_1</th>
<th class="tg-yw4l">criterion_id_1</th>
<th class="tg-yw4l">criterion_name_2</th>
<th class="tg-yw4l">criterion_id_2</th>
<th class="tg-baqh">criterion_name_3</th>
<th class="tg-yw4l">criterion_id_3</th>
<th class="tg-yw4l">criterion_name_4</th>
<th class="tg-yw4l">criterion_id_4</th>
<th class="tg-yw4l">criterion_name_5</th>
<th class="tg-yw4l">criterion_id_5</th>
</tr>
<tr>
<td class="tg-yq6s">100</td>
<td class="tg-j2zy">Condition</td>
<td class="tg-j2zy">A-363-3015</td>
<td class="tg-j2zy">null</td>
<td class="tg-j2zy">null</td>
<td class="tg-yq6s">null</td>
<td class="tg-j2zy">null</td>
<td class="tg-j2zy">null</td>
<td class="tg-j2zy">null</td>
<td class="tg-j2zy">null</td>
<td class="tg-j2zy">null</td>
</tr>
<tr>
<td class="tg-baqh">101</td>
<td class="tg-yw4l">Condition</td>
<td class="tg-yw4l">D-229-3007</td>
<td class="tg-yw4l">Condition</td>
<td class="tg-yw4l">A-229-3008</td>
<td class="tg-baqh">Condition</td>
<td class="tg-yw4l">D-229-3008</td>
<td class="tg-yw4l">Condition</td>
<td class="tg-yw4l">A-229-3009</td>
<td class="tg-yw4l">Condition</td>
<td class="tg-yw4l">D-229-3009</td>
</tr>
<tr>
<td class="tg-yq6s">102</td>
<td class="tg-j2zy">Condition</td>
<td class="tg-j2zy">A-229-3012</td>
<td class="tg-j2zy">Observation</td>
<td class="tg-j2zy">PZXC</td>
<td class="tg-yq6s">null</td>
<td class="tg-j2zy">null</td>
<td class="tg-j2zy">null</td>
<td class="tg-j2zy">null</td>
<td class="tg-j2zy">null</td>
<td class="tg-j2zy">null</td>
</tr>
</table>
除了这个DF,我还有2个查找数据帧1.条件DF和2.观察DF
条件DF:
+-----+--------------+------+
| id | condition_id | code |
+-----+--------------+------+
| 100 | A-363-3015 | xyz |
+-----+--------------+------+
| 101 | A-334-3015 | pqr |
+-----+--------------+------+
观察DF:
+-----+----------------+------+ | id | observation_id | code | +-----+----------------+------+ | 100 | PZXC | 123 | +-----+----------------+------+ | 101 | P2WZX | pw32 | +-----+----------------+------+
我希望最终的DF具有以下结构,并且该值将来自查找DF的此DF。
|person_id|criterion_name_1|criterion_id_1|criterion_value_1|criterion_name_2|criterion_id_2|criterion_value_2|criterion_name_3|criterion_id_3|criterion_value_3|criterion_name_4|criterion_id_4|criterion_value_4|criterion_name_5|criterion_id_5|criterion_value_5|
列的上述DF结构值为criterion_value_1,criterion_value_2,criterion_value_3 ..... criterion_value_5将出现以下情况。
如果criterion_name_1 = condition
那么它将查找条件DF并将值criterion_id_1
与条件DF的condition_code列匹配并获取criterion_value_1
列中的代码列的值,它将执行此操作对于所有相应的criterion_name
最多5个。
同样适用于criterion_name_1
=使用观察查找DF进行观察。