从pandas列中获取包含整数列表的唯一组合

时间:2020-06-07 23:11:54

标签: python python-3.x pandas

我有一个熊猫专栏

[1, 1539, 21]
[1, 636, 83]
[1, 636, 84]

用于重新创建列的代码,

x = pd.DataFrame({
    'array' : [[1, 1539, 21],[1, 636, 83],[1, 636, 84]]
})

如果我们求值1,

backward_connections = [](为空,因为在每一行中,1没有反向连接)

forward_connections = [1539,636](有两个636个连接,但由于我们正在找到唯一的连接,因此将被计为一个)

作为输出,我想为每个值列出唯一的向后和向前连接。

以下是完整的解决方案,

   Value backward_connections forward_connections  unique_connections
0      1                   []      [1539, 636]                   2
1     21               [1593]               []                   1
2     83                [636]               []                   1
3     84                [636]               []                   1
4    636                  [1]          [83,84]                   3
5   1539                   []             [21]                   2

1 个答案:

答案 0 :(得分:1)

这有点有趣;

# create a set of all unique values in df 
unique_values = set([v for t in x['array'].tolist() for v in t])

# create a default dictionary from these values
result_dic = {value : {'previous': [], 'forward': []} for value in unique_values}

for value in unique_values:
    for list_ in x['array']:
        if value in list_:

            # get the value's index in the list 
            value_index = list_.index(value)

            # some logic for previous 
            if value_index != 0: 
                result_dic[value]['previous'].append(list_[value_index - 1])

            # some logic for forward
            if value_index != len(list_)-1: 
                result_dic[value]['forward'].append(list_[value_index + 1])


# back to a df 
result_df = pandas.DataFrame.from_dict(result_dic, orient='index').reset_index()

# removing duplicate values in the lists 
result_df[['previous', 'forward']] = result_df[['previous', 'forward']].applymap(lambda x: list(set(x)))

# counting unique connections 
result_df['unique_connections'] = result_df['previous'].map(len) + result_df['forward'].map(len)

result_df

输出;

    index     previous    forward     unique_connections
0   1         []          [1539, 636]       2
1   1539      [1]         [21]              2
2   83        [636]       []                1
3   84        [636]       []                1
4   21        [1539]      []                1
5   636       [1]         [83, 84]          3