熊猫:从数据框中获取一些数据

时间:2016-08-17 12:39:40

标签: python pandas

我有数据框ID,"month","type" 0896cbe25bb8aec86ff93dd1bf20fa80,2013-12,desktop 0896cbe25bb8aec86ff93dd1bf20fa80,2014-01,desktop 0896cbe25bb8aec86ff93dd1bf20fa80,2014-02,desktop 0896cbe25bb8aec86ff93dd1bf20fa80,2014-03,desktop 0ce926c4c33e63aeef04a55dc204cb1a,2014-06,desktop 0ce926c4c33e63aeef04a55dc204cb1a,2014-07,desktop 0ce926c4c33e63aeef04a55dc204cb1a,2014-08,desktop 0ce926c4c33e63aeef04a55dc204cb1a,2014-09,desktop 0ce926c4c33e63aeef04a55dc204cb1a,2014-10,desktop 并有另一个数据框

idp year    month
5663b84ee164ed2628f4df6ed6ffe89b    2015    11
d156e747fb3e715a13ac850ca3e4c0e5    2014    7
0ce926c4c33e63aeef04a55dc204cb1a    2014    10
142068cd70ec3541698c919b023caf1c    2014    3
24fa9c75cc86187937f4fea0c06a6513    2014    12
3e3906343b235e6eac743be65da1dcbb    2014    6
757bf2f08a1de8383e24509d5f105ce7    2015    8

我需要在第一个数据帧中idp,如果第二个df的日期等于第一个(或从第一个df减去1个月到第二个月的1个月),我应该获得此ID的数据。 我需要得到

ID, month, type
0ce926c4c33e63aeef04a55dc204cb1a,2014-06,desktop
0ce926c4c33e63aeef04a55dc204cb1a,2014-07,desktop
0ce926c4c33e63aeef04a55dc204cb1a,2014-08,desktop
0ce926c4c33e63aeef04a55dc204cb1a,2014-09,desktop
0ce926c4c33e63aeef04a55dc204cb1a,2014-10,desktop

我该如何写这个条件?

1 个答案:

答案 0 :(得分:0)

不确定它对大数据帧是否会非常有效, 但您可以在df2:

中添加一个特殊的“合并”列
    >>> df2['year_month'] =  df2.year.astype(str) + '-'  + df2.month.astype(int).apply(lambda s: '%02d' % s)
    >>> df2
                            idp  year  month year_month
    0  5663b84ee164ed2628f4df6ed6ffe89b  2015     11    2015-11
    1  d156e747fb3e715a13ac850ca3e4c0e5  2014      7    2014-07
    2  0ce926c4c33e63aeef04a55dc204cb1a  2014     10    2014-10
    3  142068cd70ec3541698c919b023caf1c  2014      3    2014-03
    4  24fa9c75cc86187937f4fea0c06a6513  2014     12    2014-12
    5  3e3906343b235e6eac743be65da1dcbb  2014      6    2014-06
    6  757bf2f08a1de8383e24509d5f105ce7  2015      8    2015-08

Mergequery仅采用有趣的行和列:

    >>> pd.merge(df1, df2, how='left', left_on='ID', right_on='idp', suffixes=('', '_df2')) \
          .query('month <= year_month') \
          [df1.columns]
                             id year_month     type
    4  0ce926c4c33e63aeef04a55dc204cb1a    2014-06  desktop
    5  0ce926c4c33e63aeef04a55dc204cb1a    2014-07  desktop
    6  0ce926c4c33e63aeef04a55dc204cb1a    2014-08  desktop
    7  0ce926c4c33e63aeef04a55dc204cb1a    2014-09  desktop
    8  0ce926c4c33e63aeef04a55dc204cb1a    2014-10  desktop