我有两个不同的数据框:A,B。“事件”列具有相似的数据,我将使用它们来比较两个数据框。我想给Dataframe A一个新列dfA.newContext#。
为此,我需要使用“事件”列。我想遍历数据框A以找到事件的匹配项,并将dfB.context#分配给dfA.newContext#
我认为循环是最好的方法,因为我需要检查一些条件。
这可能要问很多,但我真的很困..我想做这样的事情,因为我需要考虑偏移量以确定哪个“特殊”事件和对应的名称。使用:
offset = 0
Iterate through dfA:
get dfA.event
get dfA.context#
Iterate through dfB:
if dfB.event == dfA.event:
dfA.newContext# = dfB.context#
offset = dfA.new_context# - dfA.context#
if dfB.event == "Special":
dfA.newContext# = dfA.context# - offset
数据框A
+-------------+---------+------+
|dfA.context# |dfA.event| Name |
+-------------+---------+------+
| 1465 | Bobcat | Allie|
| 1466 | Turkey |Duncan|
| 1467 | Cobra |Taylor|
| 1468 | Horse |Heath |
| 1469 | Whale | Hank |
| 1470 | Special | Emma |
| 1471 | Special | Mark |
| 1472 | Special | John |
| 1473 | Special | Terr |
| 1474 | Special | Bryl |
| 1475 | Special | Joe |
| 1476 | Special | Jamie|
| 1477 | Special | Alex |
| 1478 | Special | Sandy|
| 1479 | Special | Bruce|
| 1480 | Special | Jared|
| 1481 | Special |Hayley|
| 1482 | Special |George|
| 1483 | Special | Anna |
| 1484 | Special | Greg |
| 1485 | Special | Hans |
| 1486 | Special | Ellie|
| 1487 | Special | Tere |
| 1488 | Special | Sara |
| 1490 | Special | Mike |
| 1492 | Special | Joel |
| 1494 | Special | Tony |
| 1496 | Special | Bryce|
| 1498 | Special | Cynth|
| 1500 | Special | Kyle |
| 1501 | Special |Taylor|
| 1502 | Special | Belle|
| 1503 | Toucan | Dale |
| 1504 | Dolphin | Tim |
| 1505 | Zebra | Ham |
| 1506 | Hyena | Griff|
| 1507 | Shark |Charli|
| 1508 | Rat | Blake|
+-------------+---------+------+
数据框B
+-------------+---------+
|dfB.context# |dfB.event|
+-------------+---------+
| 1459 | Dog |
| 1463 | Bobcat |
| 1469 | Special |
| 1479 | Special |
| 1486 | Special |
| 1489 | Special |
| 1491 | Special |
| 1493 | Special |
| 1495 | Toucan |
| 1497 | Zebra |
| 1499 | Shark |
+-------------+---------+
所需DF
+-------------+---------+------+---------------+
|dfA.context# |dfA.event| Name |dfA.newContext#|
+-------------+---------+------+---------------+
| 1465 | Bobcat | Allie| 1463 | (offset of 2)
| 1466 | Turkey |Duncan| |
| 1467 | Cobra |Taylor| |
| 1468 | Horse |Heath | |
| 1469 | Whale | Hank | |
| 1470 | Special | Emma | |
| 1471 | Special | Mark | 1469 | (offset of 2)
| 1472 | Special | John | |
| 1473 | Special | Terr | |
| 1474 | Special | Bryl | |
| 1475 | Special | Joe | |
| 1476 | Special | Jamie| |
| 1477 | Special | Alex | |
| 1478 | Special | Sandy| |
| 1479 | Special | Bruce| |
| 1480 | Special | Jared| |
| 1481 | Special |Hayley| 1479 | (offset of 2)
| 1482 | Special |George| |
| 1483 | Special | Anna | |
| 1484 | Special | Greg | |
| 1485 | Special | Hans | |
| 1486 | Special | Ellie| |
| 1487 | Special | Tere | |
| 1488 | Special | Sara | 1486 | (offset of 2)
| 1490 | Special | Mike | 1489 | (offset of 1)
| 1492 | Special | Joel | 1491 | (offset of 1)
| 1494 | Special | Tony | 1493 | (offset of 1)
| 1496 | Special | Bryce| |
| 1498 | Special | Cynth| |
| 1500 | Special | Kyle | |
| 1501 | Special |Taylor| |
| 1502 | Special | Belle| |
| 1503 | Toucan | Dale | 1495 | (offset of 8)
| 1504 | Dolphin | Tim | |
| 1505 | Zebra | Ham | 1497 | (offset of 8)
| 1506 | Hyena | Griff| |
| 1507 | Shark |Charli| 1499 | (offset of 8)
| 1508 | Rat | Blake| |
+-------------+---------+------+---------------+
我能想到的唯一方法是使用偏移量,但是除了通过两个循环外,我不确定如何进行处理。我将需要参考先前的偏移量以获得正确的“特殊”事件。 您可以看到dfB.context = 1489,dfA.context没有1491,因此它将默认为dfA.context = 1490的下一个可用事件,将偏移量标记为1,然后将偏移量用于下一次匹配。这种情况一直持续到dfA.context = 1503,因为dfB.Event与dfA.Event Toucan相匹配,这时我们将偏移量标记为8。 基本上,我们基于事件进行匹配,但是如果同一事件存在多个,我们选择使用我们已记录的先前匹配的偏移量。