这是我要完成的工作:我想在df1和df2中找到匹配的“记录ID”,找到匹配项后,在df1中创建一个新列“ New Record ID”,然后复制“ New”记录ID”从df2到具有匹配“记录ID”的行。
df1
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+
| | ID | Start time | Completion time | Email | Name | Client Name | Date Due | Narrative | Role | Name ID | Role ID | Record ID |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+
| 0 | 1 | 2020-05-07 19:35:13 | 2020-05-07 19:36:29 | a@a.com | John Smith | HPOT | 2020-05-01 | HPOT TS Narrative | Supervisor | JohnSmith | Supervisor | hpotjohnsmithsupervisor2020-05-01 |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+
| 1 | 2 | 2020-05-07 19:36:31 | 2020-05-07 19:37:23 | b@b.com | John Smith | SMAN | 2020-05-01 | SMAN IP Narrative | Line Worker | JohnSmith | LineWorker | smanjohnsmithlineworker2020-05-01 |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+
| 2 | 3 | 2020-05-07 19:37:25 | 2020-05-07 19:38:57 | c@c.com | John Smith | THUL | 2020-05-01 | THUL TS Narrative | Supervisor | JohnSmith | Supervisor | thuljohnsmithsupervisor2020-05-01 |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+
| 3 | 4 | 2020-05-07 19:38:59 | 2020-05-07 19:40:30 | d@d.com | John Smith | HPOT | 2020-05-01 | HPOT IP Narrative | Line Worker | JohnSmith | LineWorker | hpotjohnsmithlineworker2020-05-01 |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+
| 4 | 5 | 2020-05-07 19:40:32 | 2020-05-07 19:41:11 | e@e.com | John Smith | SMAN | 2020-05-01 | SMAN TS Narrative | Supervisor | JohnSmith | Supervisor | smanjohnsmithsupervisor2020-05-01 |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+
df2
+---+----+------------+---------------------+---------+------------+-----------------------------------+-------------+------------+--------------------------------+------------+----------------------------------------------+
| | ID | Start time | Completion time | Email | Name | Record ID | Client Name | Due Date | Revised Narrative | Role | New Record ID |
+---+----+------------+---------------------+---------+------------+-----------------------------------+-------------+------------+--------------------------------+------------+----------------------------------------------+
| 0 | 1 | 2020-05-09 | 2020-05-09 10:57:24 | a@a.com | John Smith | hpotjohnsmithsupervisor2020-05-01 | HPOT | 2020-05-01 | HPOT Revised Summary Narrative | Supervisor | hpotjohnsmithsupervisor2020-05-01-2020-05-06 |
+---+----+------------+---------------------+---------+------------+-----------------------------------+-------------+------------+--------------------------------+------------+----------------------------------------------+
| 1 | 2 | 2020-05-09 | 2020-05-07 19:38:57 | c@c.com | John Smith | thuljohnsmithsupervisor2020-05-01 | THUL | 2020-05-01 | THUL Revised Summary Narrative | Supervisor | thuljohnsmithsupervisor2020-05-01-2020-05-06 |
+---+----+------------+---------------------+---------+------------+-----------------------------------+-------------+------------+--------------------------------+------------+----------------------------------------------+
所需的输出:df1
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+------------------------------------+
| | ID | Start time | Completion time | Email | Name | Client Name | Date Due | Narrative | Role | Name ID | Role ID | Record ID | New Record ID |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+------------------------------------+
| 0 | 1 | 2020-05-07 19:35:13 | 2020-05-07 19:36:29 | a@a.com | John Smith | HPOT | 2020-05-01 | HPOT TS Narrative | Supervisor | JohnSmith | Supervisor | hpotjohnsmithsupervisor2020-05-01 | hpotjohnsmith2020-05-01-2020-05-06 |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+------------------------------------+
| 1 | 2 | 2020-05-07 19:36:31 | 2020-05-07 19:37:23 | b@b.com | John Smith | SMAN | 2020-05-01 | SMAN IP Narrative | Line Worker | JohnSmith | LineWorker | smanjohnsmithlineworker2020-05-01 | |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+------------------------------------+
| 2 | 3 | 2020-05-07 19:37:25 | 2020-05-07 19:38:57 | c@c.com | John Smith | THUL | 2020-05-01 | THUL TS Narrative | Supervisor | JohnSmith | Supervisor | thuljohnsmithsupervisor2020-05-01 | |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+------------------------------------+
| 3 | 4 | 2020-05-07 19:38:59 | 2020-05-07 19:40:30 | d@d.com | John Smith | HPOT | 2020-05-01 | HPOT IP Narrative | Line Worker | JohnSmith | LineWorker | hpotjohnsmithlineworker2020-05-01 | |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+------------------------------------+
| 4 | 5 | 2020-05-07 19:40:32 | 2020-05-07 19:41:11 | e@e.com | John Smith | SMAN | 2020-05-01 | SMAN TS Narrative | Supervisor | JohnSmith | Supervisor | smanjohnsmithsupervisor2020-05-01 | |
+---+----+---------------------+---------------------+---------+------------+-------------+------------+--------------------+-------------+-----------+------------+-----------------------------------+------------------------------------+
我曾尝试使用isin和loc,但无法使其正常工作。最近的尝试是尝试此操作:
df2.loc[df2['Record ID'] == df1['Record ID']]
但是出现以下错误:“ ValueError:只能比较标记相同的Series对象”
在此先感谢您提供的任何帮助!
根据要求,以下是剪贴板中的数据: df1
ID,Start time,Completion time,Email,Name,Client Name,Date Due,Narrative,Role,Name ID,Role ID,Record ID
1,2020-05-07 19:35:13,2020-05-07 19:36:29,johns@abcdefcompany.com,John Smith,HPOT,2020-05-01,"HPOT TS Narrative
Sed pretium, nunc sit amet finibus finibus, sem tortor fringilla arcu, ac vehicula massa tellus id purus. Proin luctus sodales lacus sed tempus.",Supervisor,JohnSmith,Supervisor,hpotjohnsmithsupervisor2020-05-01
2,2020-05-07 19:36:31,2020-05-07 19:37:23,johns@abcdefcompany.com,John Smith,SMAN,2020-05-01,"SMAN IP Narrative - Sed pretium, nunc sit amet finibus finibus, sem tortor fringilla arcu, ac vehicula massa tellus id purus. Proin luctus sodales lacus sed tempus.
",Worker,JohnSmith,Worker,smanJohnSmithWorker2020-05-01
3,2020-05-07 19:37:25,2020-05-07 19:38:57,johns@abcdefcompany.com,John Smith,THUL,2020-05-01,"THUL TS Narrative
Entry 3 - Sed pretium, nunc sit amet finibus finibus, sem tortor fringilla arcu, ac vehicula massa tellus id purus. Proin luctus sodales lacus sed tempus..
",Supervisor,JohnSmith,Supervisor,thulJohnSmithSupervisor2020-05-01
4,2020-05-07 19:38:59,2020-05-07 19:40:30,johns@abcdefcompany.com,John Smith,HPOT,2020-05-01,HPOT IP Narrative,Worker,JohnSmith,Worker,hpotJohnSmithWorker2020-05-01
5,2020-05-07 19:40:32,2020-05-07 19:41:11,johns@abcdefcompany.com,John Smith,SMAN,2020-05-01,SMAN TS Narrative,Supervisor,JohnSmith,Supervisor,smanJohnSmithSupervisor2020-05-01
6,2020-05-07 19:41:12,2020-05-07 19:41:40,johns@abcdefcompany.com,John Smith,THUL,2020-05-01,THUL IP Narrative,Worker,JohnSmith,Worker,thulJohnSmithWorker2020-05-01
df2
ID,Start time,Completion time,Email,Name,Record ID,Client Name,Due Date,Revised Narrative,Role,New Record ID
1,2020-05-09,2020-05-09 10:57:24,johns@abcdefcompany.com,John Smith,hpotjohnsmithsupervisor2020-05-01,HPOT,2020-05-01,HPOT Revised Summary Narrative,Supervisor,hpotjohnsmithsupervisor2020-05-012020-05-09
这可以由试图回答我的问题的人复制到剪贴板,然后跟着
df = pd.read_clipboard(sep=',')
尝试:所以我尝试了“加入”建议:
df1.set_index('Record ID').join(df2.set_index('Record ID'))
,发生此错误:“ ValueError:列重叠,但未指定后缀:Index(['ID','开始时间','完成时间','电子邮件','名称','客户名称', '角色'], dtype ='object')“
答案 0 :(得分:0)
df1['Records ID']
和df2['Records ID']
之间没有匹配的值,因为一列包含大写字母,另一列没有大写字母。必须解决该问题才能找到匹配项。# fix Record ID for test data, this may not be required with the real data
df2['Record ID'] = df2['Record ID'].str.lower()
df['Record ID'] = df['Record ID'].str.lower()
# removed Narrative for cosmetic reasons on SO. Not necessary otherwise
df.drop(columns=['Narrative'], inplace=True)
# merge the two dataframes is the main thing
df3 = df.merge(df2[['Record ID', 'New Record ID']], how='left', on='Record ID')
print(df3)
ID Start time Completion time Email Name Client Name Date Due Role Name ID Role ID Record ID New Record ID
1 2020-05-07 19:35:13 2020-05-07 19:36:29 johns@abcdefcompany.com John Smith HPOT 2020-05-01 Supervisor JohnSmith Supervisor hpotjohnsmithsupervisor2020-05-01 hpotjohnsmithsupervisor2020-05-012020-05-09
2 2020-05-07 19:36:31 2020-05-07 19:37:23 johns@abcdefcompany.com John Smith SMAN 2020-05-01 Worker JohnSmith Worker smanJohnSmithWorker2020-05-01 NaN
3 2020-05-07 19:37:25 2020-05-07 19:38:57 johns@abcdefcompany.com John Smith THUL 2020-05-01 Supervisor JohnSmith Supervisor thulJohnSmithSupervisor2020-05-01 NaN
4 2020-05-07 19:38:59 2020-05-07 19:40:30 johns@abcdefcompany.com John Smith HPOT 2020-05-01 Worker JohnSmith Worker hpotJohnSmithWorker2020-05-01 NaN
5 2020-05-07 19:40:32 2020-05-07 19:41:11 johns@abcdefcompany.com John Smith SMAN 2020-05-01 Supervisor JohnSmith Supervisor smanJohnSmithSupervisor2020-05-01 NaN
6 2020-05-07 19:41:12 2020-05-07 19:41:40 johns@abcdefcompany.com John Smith THUL 2020-05-01 Worker JohnSmith Worker thulJohnSmithWorker2020-05-01 NaN NaN