如何计算 Pandas 中所有订单中订单商品的百分比

时间:2021-01-18 09:38:06

标签: python pandas

我在计算所有订单组合中存在的商品的百分比时遇到困难? 物品是人们通常购买的玩具:熊、兔子、驼鹿、狗、马、猫、老鼠、猪、鸡、鹰、浣熊、海豚、鲨鱼和鲸鱼。

我有一个 order_portfolio_id 代表购买玩具的人,我有列 position_X,其中 X 是订购的物品的位置数,总共 8 个位置。订购玩具的人永远不会购买相同的玩具两次,因此这些物品永远不会在一个组合/行中重复。请注意,我的原始数据框包含 NaN,因此我也将它们包含在此处。

>>> import pandas as pd
>>> from numpy import nan
>>> 
>>> data = pd.DataFrame({'order_portfolio_num': [1,2,3,4,5,6,7,8],
...                     'order_position_1':['dog', 'horse', 'cat','shark', 'dog', 'rabbit', 'rabbit', 'cat'],
...                     'order_position_2':['mouse', 'bear', 'dog', 'dolphin', 'cat', 'bear', 'eagle', 'shark'],
...                     'order_position_3':['bear', 'dog', 'raccoon', 'dog', 'whale', 'mouse', 'cat', 'moose'],
...                     'order_position_4':['dolphin', 'cat', 'chicken', nan, 'horse', 'pig', 'dog', 'chicken'],
...                     'order_position_5':['pig', 'chicken', 'eagle', nan, 'bear', 'raccoon', 'whale', nan], 
...                     'order_position_6':[nan, 'whale', nan, nan, 'eagle', 'moose', nan, nan],
...                     'order_position_7':[nan, 'dolphin', nan, nan, nan, 'chicken', nan, nan]})
>>> 
>>> data
   order_portfolio_num order_position_1 order_position_2 order_position_3 order_position_4 order_position_5 order_position_6 order_position_7
0                    1              dog            mouse             bear          dolphin              pig              NaN              NaN
1                    2            horse             bear              dog              cat          chicken            whale          dolphin
2                    3              cat              dog          raccoon          chicken            eagle              NaN              NaN
3                    4            shark          dolphin              dog              NaN              NaN              NaN              NaN
4                    5              dog              cat            whale            horse             bear            eagle              NaN
5                    6           rabbit             bear            mouse              pig          raccoon            moose          chicken
6                    7           rabbit            eagle              cat              dog            whale              NaN              NaN
7                    8              cat            shark            moose          chicken              NaN              NaN              NaN

我想计算所有产品组合中最常见的前 5 个玩具,以百分比表示。例如,如果我有 10 个 order_portfolios,其中 4 个中有玩具熊,那么熊玩具的价值将为 40%。我的目标是拥有这样的东西:

toy      percent
dog        60%
cat        48%
mouse      36%
bear       28%
shark      19%

我试图对数据框中的所有玩具求和,但我得到了所有投资组合中所有玩具的出现次数,我不确定如何从中计算百分比(哪个值代表 100% 的值?) ,如果它甚至是我正在寻找的,因为它会给我所有玩具的出现百分比,而不是投资组合。所以我不确定如何继续。这是我试过的:

>>> cols = ['order_position_1', 'order_position_2', 'order_position_3', 'order_position_4',
...        'order_position_5', 'order_position_6', 'order_position_7'] 
>>> 
>>> position_values = data[cols].melt().groupby('value').size().reset_index(name='count')
>>> 
>>> position_values.sort_values(by = 'count', ascending = False)
      value  count
3       dog      6
1       cat      5
0      bear      4
2   chicken      4
4   dolphin      3
5     eagle      3
13    whale      3
6     horse      2
7     moose      2
8     mouse      2
9       pig      2
10   rabbit      2
11  raccoon      2
12    shark      2

有什么想法吗?

3 个答案:

答案 0 :(得分:2)

使用 DataFrame.meltSeries.value_counts 并除以原始行数:

protocol ModeName: CustomStringConvertible, Equatable {
    static var off: Self { get }
    static var exit: Self { get }
    static var auto: Self { get }
    static var partial: Self { get }
    static var open: Self { get }
}

extension ModeName {
    var description: String {
        switch self {
        case .off: return "MODE_OFF".localized()
        case .exit: return "MODE_EXIT".localized()
        case .auto: return "MODE_AUTO".localized()
        case .partial: return "MODE_PARTIAL".localized()
        case .open: return "MODE_OPEN".localized()
        default:
            return ""
        }
    }
}

enum DoorModeName: String, ModeName {
    case off = "MODE_OFF"
    case exit = "MODE_EXIT"
    case auto = "MODE_AUTO"
    case partial = "MODE_PARTIAL"
    case open = "MODE_OPEN"
}

enum OperatingModeName: Int, ModeName {
    case off = 0
    case exit = 1
    case auto = 2
    case partial = 3
    case open = 4   
}

答案 1 :(得分:1)

这是总体思路:

  1. 首先获取所有玩具的名称
  2. 检查每个玩具是否排成一排并存储该计数
  3. 获取频率
----------------------------------------------------------
if OBJECT_ID(N'tempdb..#temptable') IS NOT NULL --Temp Table Checker
        begin
         Drop table #temptable
        end
----------------------------------------------------------
if(@FieldID IS NOT NULL)
    begin
        select * into #temptable from RealTable where
        FieldName like '%'+@Name+'%' or FieldTag like '%'+@Name+'%'
        and FieldID= @FieldID and IsActive = 1
    end
else
    begin
      select * into #temptable from RealTable where ---> Getting Error here '#temptable already exist'
        FieldName like '%'+@Name+'%' or FieldTag like '%'+@Name+'%'
         and IsActive = 1
    end

文档

答案 2 :(得分:0)

将每个值除以订单组合的数量(N)

position_values['percent']=position_values['count']/data['order_portfolio_num'].count()

然后排序和头部