我的最终目标是使用d3创建一个Force-Directed graph,其中显示了在我的应用程序中使用某些功能的用户群。要做到这一点,我需要创建一组"链接"具有以下格式(取自上面的链接):
{"source": "Napoleon", "target": "Myriel", "value": 1}
为了实现这一步,我从一个看起来像这样的pandas数据框开始。如何为每个APP_NAME
生成FEAT_ID
/ USER_ID
组合的排列列表?
APP_NAME FEAT_ID USER_ID CNT
280 app1 feature1 user1 114
2622 app2 feature2 user1 8
1698 app2 feature3 user1 15
184 app3 feature4 user1 157
2879 app2 feature5 user1 7
3579 app2 feature6 user1 5
232 app2 feature7 user1 136
295 app2 feature8 user1 111
2620 app2 feature9 user1 8
2047 app3 feature10 user2 11
3395 app2 feature2 user2 5
3044 app2 feature11 user2 6
3400 app2 feature12 user2 5
预期结果:
根据上述数据框架,我希望user1
和user2
生成以下排列
user1:
app1-feature1 -> app2-feature2, app2-feature3, app3-feature4, app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9
app2-feature2 -> app2-feature3, app3-feature4, app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9
app2-feature3 -> app3-feature4, app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9
app3-feature4 -> app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9
app2-feature5 -> app2-feature6, app2-feature7, app2-feature8, app2-feature9
app2-feature6 -> app2-feature7, app2-feature8, app2-feature9
app2-feature7 -> app2-feature8, app2-feature9
app2-feature8 -> app2-feature9
user2:
app3-feature10 -> app2-feature2, app2-feature11, app2-feature12
app2-feature2 -> app2-feature11, app2-feature12
app2-feature11 -> app2-feature12
由此,我希望能够生成D3的预期输入,user2
看起来像这样。
{"source": "app3-feature10", "target": "app2-feature2"}
{"source": "app3-feature10", "target": "app2-feature11"}
{"source": "app3-feature10", "target": "app2-feature12"}
{"source": "app2-feature2", "target": "app2-feature11"}
{"source": "app2-feature2", "target": "app2-feature12"}
{"source": "app2-feature11", "target": "app2-feature12"}
如何为数据框中的每个APP_NAME
生成FEAT_ID
/ USER_ID
组合的排列列表?
答案 0 :(得分:1)
我会考虑从您的数据框中制作一些元组,然后使用类似itertools.permutations
的内容创建所有排列,然后从那里根据需要制作字典:
import itertools
allUserPermutations = {}
groupedByUser = df.groupby('USER_ID')
for k, g in groupedByUser:
requisiteColumns = g[['APP_NAME', 'FEAT_ID']]
# tuples out of dataframe rows
userCombos = [tuple(x) for x in requisiteColumns.values]
# this is a generator obj
userPermutations = itertools.permutations(userCombos, 2)
# create a list of specified dicts for the current user
userPermutations = [{'source': s, 'target': tar for s, tar in userPermutations]
# store the current users specified dicts
allUserPermutations[k] = userPermutations
如果排列没有返回所需的行为,您可以尝试其他一些组合生成器found here。希望这种策略有效(我现在没有大熊猫启用的REPL来测试它)。祝你好运!