熊猫在间隔中找到价值

时间:2015-01-04 21:44:57

标签: python pandas

在pandas中,如果我在dataframe(transdf)中有交易数据,如下所示:

OrderId, ShippmentSegmentsDays
1      , 1
2      , 3
3      , 4
4      , 10

我还有另一个指定间隔的df(segmentdf):

ShippmentSegmentDaysStart , ShippmentSegmentDaysEnd , ShippmentSegment
-9999999                  , 0                       , 'On-Time'
0                         , 1                       , '1 day late'
1                         , 2                       , '2 days late'
2                         , 3                       , '3 days late'
3                         , 9999999                 , '>3 days late'

我需要添加一个基于“ShippmentSegmentsDays”和“ShippmentSegment”的列。所以基本上对于“transdf”中的每一行,我需要检查“ShippmentSegmentsDays”值,其间隔可以从“segmentdf”中找到

因此,“transdf”应如下所示:

OrderId, ShippmentSegmentsDays, ShippmentSegment
1      , 1                    , '1 day late'
2      , 0                    , 'On-Time'
3      , 4                    , '>3 days late'
4      , 10                   , '>3 days late'

有人可以给我一个如何处理这种情况的建议吗?

谢谢! 斯蒂芬

2 个答案:

答案 0 :(得分:2)

如果您知道pandas.apply(args)中设置的规则是静态的且不会更改,则可以使用transdf将功能应用于segmentdf数据框中的每一行。也许以下代码段可能对您有所帮助。我没有对此进行过测试,所以请小心谨慎,但我认为它应该让你开始朝着正确的方向前进。

# create a series of just the data from the 'ShippmentSegmentDays' column
seg_days_df = trends['ShippmentSegmentDays']

# Create a new column, 'ShippmentSegment', in 'transdf' data frame by calling
# our utility function on the series created above.
transdf['ShippmentSegment'] = seg_days_df.apply(calc_ship_segment, axis=1)

# Utility function to define the rules set in the 'segmentdf' data frame
def calc_ship_segment(num):
     if not num:
         return 'On Time'
     elif num == 1:
         return '1 Day Late'
     elif num == 2:
         return '2 Days Late'
     elif num == 3:
         return '3 Days Late'
     else:
         return '>3 Days Late'

答案 1 :(得分:1)

旧帖子,但我遇到了同样的问题。熊猫提供了一个Interval function对我有用的东西。