如何根据另一列的值获得两列组合的所有排列列表?

时间:2016-09-01 02:47:35

标签: python pandas

我的最终目标是使用d3创建一个Force-Directed graph,其中显示了在我的应用程序中使用某些功能的用户群。要做到这一点,我需要创建一组"链接"具有以下格式(取自上面的链接):

{"source": "Napoleon", "target": "Myriel", "value": 1}

为了实现这一步,我从一个看起来像这样的pandas数据框开始。如何为每个APP_NAME生成FEAT_ID / USER_ID组合的排列列表?

        APP_NAME      FEAT_ID   USER_ID  CNT  
280     app1          feature1  user1    114  
2622    app2          feature2  user1    8  
1698    app2          feature3  user1    15  
184     app3          feature4  user1    157  
2879    app2          feature5  user1    7  
3579    app2          feature6  user1    5  
232     app2          feature7  user1    136  
295     app2          feature8  user1    111  
2620    app2          feature9  user1    8  
2047    app3         feature10  user2    11  
3395    app2          feature2  user2    5  
3044    app2         feature11  user2    6  
3400    app2         feature12  user2    5  

预期结果:

根据上述数据框架,我希望user1user2生成以下排列

user1:
    app1-feature1 -> app2-feature2, app2-feature3, app3-feature4, app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9
    app2-feature2 -> app2-feature3, app3-feature4, app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9
    app2-feature3 -> app3-feature4, app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9
    app3-feature4 -> app2-feature5, app2-feature6, app2-feature7, app2-feature8, app2-feature9
    app2-feature5 -> app2-feature6, app2-feature7, app2-feature8, app2-feature9
    app2-feature6 -> app2-feature7, app2-feature8, app2-feature9
    app2-feature7 -> app2-feature8, app2-feature9
    app2-feature8 -> app2-feature9

user2:
    app3-feature10 -> app2-feature2, app2-feature11, app2-feature12
    app2-feature2  -> app2-feature11, app2-feature12
    app2-feature11 -> app2-feature12

由此,我希望能够生成D3的预期输入,user2看起来像这样。

{"source": "app3-feature10", "target": "app2-feature2"}
{"source": "app3-feature10", "target": "app2-feature11"}
{"source": "app3-feature10", "target": "app2-feature12"}
{"source": "app2-feature2", "target": "app2-feature11"}
{"source": "app2-feature2", "target": "app2-feature12"}
{"source": "app2-feature11", "target": "app2-feature12"}

如何为数据框中的每个APP_NAME生成FEAT_ID / USER_ID组合的排列列表?

1 个答案:

答案 0 :(得分:1)

我会考虑从您的数据框中制作一些元组,然后使用类似itertools.permutations的内容创建所有排列,然后从那里根据需要制作字典:

import itertools

allUserPermutations = {}

groupedByUser = df.groupby('USER_ID')
for k, g in groupedByUser:

    requisiteColumns = g[['APP_NAME', 'FEAT_ID']]

    # tuples out of dataframe rows
    userCombos = [tuple(x) for x in requisiteColumns.values]

    # this is a generator obj
    userPermutations = itertools.permutations(userCombos, 2)

    # create a list of specified dicts for the current user
    userPermutations = [{'source': s, 'target': tar for s, tar in userPermutations]

    # store the current users specified dicts
    allUserPermutations[k] = userPermutations 

如果排列没有返回所需的行为,您可以尝试其他一些组合生成器found here。希望这种策略有效(我现在没有大熊猫启用的REPL来测试它)。祝你好运!