我在计算所有订单组合中存在的商品的百分比时遇到困难? 物品是人们通常购买的玩具:熊、兔子、驼鹿、狗、马、猫、老鼠、猪、鸡、鹰、浣熊、海豚、鲨鱼和鲸鱼。
我有一个 order_portfolio_id 代表购买玩具的人,我有列 position_X,其中 X 是订购的物品的位置数,总共 8 个位置。订购玩具的人永远不会购买相同的玩具两次,因此这些物品永远不会在一个组合/行中重复。请注意,我的原始数据框包含 NaN,因此我也将它们包含在此处。
>>> import pandas as pd
>>> from numpy import nan
>>>
>>> data = pd.DataFrame({'order_portfolio_num': [1,2,3,4,5,6,7,8],
... 'order_position_1':['dog', 'horse', 'cat','shark', 'dog', 'rabbit', 'rabbit', 'cat'],
... 'order_position_2':['mouse', 'bear', 'dog', 'dolphin', 'cat', 'bear', 'eagle', 'shark'],
... 'order_position_3':['bear', 'dog', 'raccoon', 'dog', 'whale', 'mouse', 'cat', 'moose'],
... 'order_position_4':['dolphin', 'cat', 'chicken', nan, 'horse', 'pig', 'dog', 'chicken'],
... 'order_position_5':['pig', 'chicken', 'eagle', nan, 'bear', 'raccoon', 'whale', nan],
... 'order_position_6':[nan, 'whale', nan, nan, 'eagle', 'moose', nan, nan],
... 'order_position_7':[nan, 'dolphin', nan, nan, nan, 'chicken', nan, nan]})
>>>
>>> data
order_portfolio_num order_position_1 order_position_2 order_position_3 order_position_4 order_position_5 order_position_6 order_position_7
0 1 dog mouse bear dolphin pig NaN NaN
1 2 horse bear dog cat chicken whale dolphin
2 3 cat dog raccoon chicken eagle NaN NaN
3 4 shark dolphin dog NaN NaN NaN NaN
4 5 dog cat whale horse bear eagle NaN
5 6 rabbit bear mouse pig raccoon moose chicken
6 7 rabbit eagle cat dog whale NaN NaN
7 8 cat shark moose chicken NaN NaN NaN
我想计算所有产品组合中最常见的前 5 个玩具,以百分比表示。例如,如果我有 10 个 order_portfolios,其中 4 个中有玩具熊,那么熊玩具的价值将为 40%。我的目标是拥有这样的东西:
toy percent
dog 60%
cat 48%
mouse 36%
bear 28%
shark 19%
我试图对数据框中的所有玩具求和,但我得到了所有投资组合中所有玩具的出现次数,我不确定如何从中计算百分比(哪个值代表 100% 的值?) ,如果它甚至是我正在寻找的,因为它会给我所有玩具的出现百分比,而不是投资组合。所以我不确定如何继续。这是我试过的:
>>> cols = ['order_position_1', 'order_position_2', 'order_position_3', 'order_position_4',
... 'order_position_5', 'order_position_6', 'order_position_7']
>>>
>>> position_values = data[cols].melt().groupby('value').size().reset_index(name='count')
>>>
>>> position_values.sort_values(by = 'count', ascending = False)
value count
3 dog 6
1 cat 5
0 bear 4
2 chicken 4
4 dolphin 3
5 eagle 3
13 whale 3
6 horse 2
7 moose 2
8 mouse 2
9 pig 2
10 rabbit 2
11 raccoon 2
12 shark 2
有什么想法吗?
答案 0 :(得分:2)
使用 DataFrame.melt
和 Series.value_counts
并除以原始行数:
protocol ModeName: CustomStringConvertible, Equatable {
static var off: Self { get }
static var exit: Self { get }
static var auto: Self { get }
static var partial: Self { get }
static var open: Self { get }
}
extension ModeName {
var description: String {
switch self {
case .off: return "MODE_OFF".localized()
case .exit: return "MODE_EXIT".localized()
case .auto: return "MODE_AUTO".localized()
case .partial: return "MODE_PARTIAL".localized()
case .open: return "MODE_OPEN".localized()
default:
return ""
}
}
}
enum DoorModeName: String, ModeName {
case off = "MODE_OFF"
case exit = "MODE_EXIT"
case auto = "MODE_AUTO"
case partial = "MODE_PARTIAL"
case open = "MODE_OPEN"
}
enum OperatingModeName: Int, ModeName {
case off = 0
case exit = 1
case auto = 2
case partial = 3
case open = 4
}
答案 1 :(得分:1)
这是总体思路:
----------------------------------------------------------
if OBJECT_ID(N'tempdb..#temptable') IS NOT NULL --Temp Table Checker
begin
Drop table #temptable
end
----------------------------------------------------------
if(@FieldID IS NOT NULL)
begin
select * into #temptable from RealTable where
FieldName like '%'+@Name+'%' or FieldTag like '%'+@Name+'%'
and FieldID= @FieldID and IsActive = 1
end
else
begin
select * into #temptable from RealTable where ---> Getting Error here '#temptable already exist'
FieldName like '%'+@Name+'%' or FieldTag like '%'+@Name+'%'
and IsActive = 1
end
文档
答案 2 :(得分:0)
将每个值除以订单组合的数量(N)
position_values['percent']=position_values['count']/data['order_portfolio_num'].count()
然后排序和头部