数据准备 - Python

时间:2018-05-17 01:35:42

标签: python

我有一个第一个数据框,其中包含一个名为' Date'的系列,以及一个名为' People_1'的系列的变量数n。到' People_n' :

Id              Date                    People_1         People_2    People_3           People_4    People_5        People_6     People_7
12.0    Sat Dec 19 00:00:00 EST 1970    Loretta Lynn    Owen Bradley          
13.0    Sat Jun 07 00:00:00 EDT 1980    Sissy Spacek    Loretta Lynn Owen Bradley   
14.0    Sat Dec 04 00:00:00 EST 2010    Loretta Lynn    Sheryl Crow Miranda Lambert 
15.0    Sat Aug 09 00:00:00 EDT 1969    Charley Pride   Dallas Frazier  A.L. "Doodle"   Chet Atkins Jack Clement    Bob Ferguson    Felton Jarvis

我还有另一个包含名称和传记数据列表的数据框:

  People          Birth_date                Birth_state      Sex    Ethnicity
Charles Kelley  Fri Sep 11 00:00:00 EDT 1981    GA           Male   Caucasian
Hillary Scott   Tue Apr 01 00:00:00 EST 1986    TN          Female  Caucasian
Reba McEntire   Mon Mar 28 00:00:00 EST 1955    OK          Female  Caucasian
Wanda Jackson   Wed Oct 20 00:00:00 EST 1937    OK          Female  Caucasian
Carrie UnderwoodThu Mar 10 00:00:00 EST 1983    OK          Female  Caucasian
Toby Keith      Sat Jul 08 00:00:00 EDT 1961    OK          Male    Caucasian
David Bellamy   Sat Sep 16 00:00:00 EDT 1950    FL          Male    Caucasian
Howard Bellamy  Sat Feb 02 00:00:00 EST 1946    FL          Male    Caucasian
Keith Urban Thu Oct 26 00:00:00 EDT 1967    Northland       Male    Caucasian
Miranda Lambert Thu Nov 10 00:00:00 EST 1983    TX          Female  Caucasian
Sam Hunt        Sat Dec 08 00:00:00 EST 1984    GA          Male    Caucasian
Johnny Cash     Fri Feb 26 00:00:00 EST 1932    AR          Male    Caucasian
June Carter     Sun Jun 23 00:00:00 EDT 1929    VA          Female  Caucasian
Merle Haggard   Tue Apr 06 00:00:00 EST 1937    CA          Male    Caucasian
Waylon Jennings Tue Jun 15 00:00:00 EDT 1937    TX          Male    Caucasian
Willie Nelson   Sat Apr 29 00:00:00 EST 1933    TX          Male    Caucasian
Loretta Lynn    Thu Apr 14 00:00:00 EST 1932    KY          Female  Caucasian
Sissy Spacek    Sun Dec 25 00:00:00 EST 1949    TX          Female  Caucasian
Sheryl Crow Sun Feb 11 00:00:00 EST 1962        MO          Female  Caucasian
Charley Pride   Sun Mar 18 00:00:00 EST 1934    MS          Male    African American
Rodney Clawon   ?                               TX          Male    Caucasian
Nathan Chapman  ?                               TN          Male    Caucasian

我想在每个日期获取当天参与的每个人的生物数据:

    Date                        Birth_state      Sex    Ethnicity
 Sat Dec 19 00:00:00 EST 1970   KY              Female  Caucasian
 Sat Jun 07 00:00:00 EDT 1980   TX              Female  Caucasian
 Sat Jun 07 00:00:00 EDT 1980   KY              Female  Caucasian
 Sat Dec 04 00:00:00 EST 2010   KY              Female  Caucasian
 Sat Dec 04 00:00:00 EST 2010   MO              Female  Caucasian
 Sat Dec 04 00:00:00 EST 2010   TX              Female  Caucasian
 Sat Aug 09 00:00:00 EDT 1969   MS              Male    African American

精度: 考虑到我的生物数据尚未完成,有些名称缺失,这就解释了为什么我不为每个人排队。

那么有没有办法在Python中执行此任务? 梅西

1 个答案:

答案 0 :(得分:1)

您可以在pandas中使用左连接首先连接两个表,然后选择所需的列。

例如,您可以先将所有人聚合到一个新列中,例如,命名为“人物”#39;。然后为每个人添加一行。完成此操作后,请像以前一样左键连接两个数据帧。