我有两个不同的数据框:A,B。“事件”列具有相似的数据,我将使用它们来比较两个数据框。 我想给Dataframe A一个新列dfA.newContext#。
为此,我需要使用“事件”列。 我想遍历数据框A以找到事件的匹配项,并将dfB.context#分配给dfA.newContext#
我认为循环是最好的方法,因为我需要检查一些条件。
这可能要问很多,但我真的被卡住了。 我想做这样的事情:
offset = 0
Iterate through dfA:
extract event
extract context#
Iterate through dfB:
if dfB.event == dfA.event:
dfA.newContext# = dfB.context#
offset = dfA.new_context# - dfA.context#
if dfB.event == "Special":
dfA.newContext# = dfA.context# - offset
数据框A
+-------------+---------+------+
|dfA.context# |dfA.event| Name |
+-------------+---------+------+
| 0 | Special | Bob |
| 2 | Special | Joan |
| 4 | Bird | Susie|
| 5 | Special | Alice|
| 6 | Special | Tom |
| 7 | Special | Luis |
| 8 | Parrot | Jill |
| 9 | Special | Reed |
| 10 | Special | Lucas|
| 11 | Snake | Kat |
| 12 | Special | Bill |
| 13 | Special | Leo |
| 14 | Special | Peter|
| 15 | Special | Mark |
| 16 | Special | Joe |
| 17 | Special | Lora |
| 18 | Special | Care |
| 19 |Elephant | David|
| 20 | Special | Ann |
| 21 | Special | Larry|
| 22 | Skunk | Tony |
+-------------+---------+------+
数据框B
+-------------+---------+
|dfB.context# |dfB.event|
+-------------+---------+
| 0 | Special |
| 0 | Special |
| 0 | Special |
| 1 | Special |
| 1 | Special |
| 1 | Special |
| 1 | Special |
| 2 | Bird |
| 2 | Bird |
| 3 | Special |
| 6 | Parrot |
| 6 | Parrot |
| 6 | Parrot |
| 6 | Parrot |
| 7 | Special |
| 7 | Special |
| 9 | Snake |
| 9 | Snake |
| 9 | Snake |
| 10 | Special |
| 17 |Elephant |
| 17 |Elephant |
| 17 |Elephant |
| 18 | Special |
| 18 | Special |
| 20 | Skunk |
| 20 | Skunk |
| 21 | Special |
| 26 | Antelope|
+-------------+---------+
所需DF
+-------------+---------+------+-------------+
|dfA.context# |dfA.event| Name |dfA.newContext#|
+-------------+---------+------+-------------+
| 0 | Special | Bob | 0 |
| 2 | Special | Joan | 1 |
| 4 | Bird | Susie| 2 |
| 5 | Special | Alice| 3 |
| 6 | Special | Tom | |
| 7 | Special | Luis | |
| 8 | Parrot | Jill | 6 |
| 9 | Special | Reed | 7 |
| 10 | Special | Lucas| |
| 11 | Snake | Kat | 9 |
| 12 | Special | Bill | 10 |
| 13 | Special | Leo | |
| 14 | Special | Peter| |
| 15 | Special | Mark | |
| 16 | Special | Joe | |
| 17 | Special | Lora | |
| 18 | Special | Care | |
| 19 |Elephant | David| 17 |
| 20 | Special | Ann | 18 |
| 21 | Special | Larry| |
| 22 | Skunk | Tony | 20 |
+-------------+---------+------+-------------+
我如何一次遍历两个数据框并访问信息?
答案 0 :(得分:1)
95%的时间,您可以使用熊猫矢量化方法,而无需循环。在这种情况下,您可以将pd.merge
用作长循环的一种简单,干净且有效的替代方法。
编辑:(答案#1 ):实际上,您可以与left_on=dfA.index, right_on='context'
进行更高级的合并,并与其他清理工作在同一行中进行合并后的操作,但请参见下面的更完整答案,它采用类似的方法:
df = (pd.merge(dfA, dfB['context'], how='left', left_on=dfA.index, right_on='context')
.drop_duplicates()
.dropna(subset=['Name'])
.drop('context', axis=1)
.rename({'context_x' : 'context', 'context_y' : 'newContext'}, axis=1).fillna(''))
答案2: 您可以在操纵两个数据框以准备合并之后将两个数据框合并在一起:
dfA
-使context
中的dfA
列与index
相等,但是在对其进行更改之前,将其另存为s
系列,以备后用< / li>
dfB
-在准备合并时删除重复项,重置索引,并将索引名称更改为newContext
。event
和context
并将newContext
值替换为context
值,其中null。context
将df['context'] = s
恢复为原始数据s = dfA['context']
dfA['context'] = dfA.index.astype(str)
dfB = dfB.drop_duplicates().reset_index().rename({'index' :'newContext'}, axis=1).astype(str)
df = pd.merge(dfA, dfB, how='left', on=['event', 'context'])
df['newContext'] = df['newContext'].where(df['newContext'].isnull(), df['context']).fillna('')
df['context'] = s
df
Out[9]:
context event Name newContext
0 0 Special Bob 0
1 2 Special Joan 1
2 4 Bird Susie 2
3 5 Special Alice 3
4 6 Special Tom
5 7 Special Luis
6 8 Parrot Jill 6
7 9 Special Reed 7
8 10 Special Lucas
9 11 Snake Kat 9
10 12 Special Bill 10
11 13 Special Leo
12 14 Special Peter
13 15 Special Mark
14 16 Special Joe
15 17 Special Lora
16 18 Special Care
17 19 Elephant David 17
18 20 Special Ann 18
19 21 Special Larry
20 22 Skunk Tony 20