Python Pandas - 多个特定列中变量的独特组合

时间:2016-06-09 20:39:59

标签: python pandas unique reshape

我正在尝试获取电话号码和值的唯一组合,其中电话号码和值分别位于两个潜在列中。

例如:

df = pd.DataFrame({'phone1':[4567890876, 4567890876, 9178889999, 3237800876],
                   'phone2':[4567890876, 4567890876, 9178889999, 2139990000],
                   'num1':[1,2,3,3],
                   'num2':[5,2,3,1]})

唯一值如下:

phone         num
4567890876    1
4567890876    2
4567890876    5
9178889999    3
2139990000    1
2139990000    3
3237800876    1
3237800876    3

我找到了两种方法,但他们都觉得非常笨拙/错误:

1)将df复制四次(phone1 / num1,phone1 / num2,phone2 / num1,phone2 / num2),连接并删除重复项

2)通过电话字段进行索引,堆叠,然后按数字字段索引并再次堆叠,并删除重复项

如果有人有更好/更清洁/更快的想法,我们将不胜感激!

1 个答案:

答案 0 :(得分:2)

pd.melt可以将多个列合并为一个值列(以及一个变量列)。您可以使用它一次合并num1num2列,第二次合并phone1phone2列:

import pandas as pd
df = pd.DataFrame({'phone1':[4567890876, 4567890876, 9178889999, 3237800876],
                   'phone2':[4567890876, 4567890876, 9178889999, 2139990000],
                   'num1':[1,2,3,3],
                   'num2':[5,2,3,1]})

melted = pd.melt(df, id_vars=['phone1', 'phone2'], var_name='numvar', value_name='num')
melted = pd.melt(melted, id_vars=['numvar', 'num'], value_name='phone')
melted = melted[['num', 'phone']]
melted = melted.drop_duplicates()
print(melted)

产量

    num       phone
0     1  4567890876
1     2  4567890876
2     3  9178889999
3     3  3237800876
4     5  4567890876
7     1  3237800876
11    3  2139990000
15    1  2139990000

解释:使用id_vars阻止phone1phone2列融合。下面显示了融合num1num2列的结果:

In [166]: melted = pd.melt(df, id_vars=['phone1', 'phone2'], var_name='numvar', value_name='num'); melted
Out[166]: 
       phone1      phone2 numvar  num
0  4567890876  4567890876   num1    1
1  4567890876  4567890876   num1    2
2  9178889999  9178889999   num1    3
3  3237800876  2139990000   num1    3
4  4567890876  4567890876   num2    5
5  4567890876  4567890876   num2    2
6  9178889999  9178889999   num2    3
7  3237800876  2139990000   num2    1

然后再次应用pd.meltphone1phone2列合并为一个:

In [168]: pd.melt(melted, id_vars=['numvar', 'num'], value_name='phone')
Out[168]: 
   numvar  num variable       phone
0    num1    1   phone1  4567890876
1    num1    2   phone1  4567890876
2    num1    3   phone1  9178889999
3    num1    3   phone1  3237800876
4    num2    5   phone1  4567890876
5    num2    2   phone1  4567890876
6    num2    3   phone1  9178889999
7    num2    1   phone1  3237800876
8    num1    1   phone2  4567890876
9    num1    2   phone2  4567890876
10   num1    3   phone2  9178889999
11   num1    3   phone2  2139990000
12   num2    5   phone2  4567890876
13   num2    2   phone2  4567890876
14   num2    3   phone2  9178889999
15   num2    1   phone2  2139990000

删除重复项,然后删除numvarvariable列,即可获得所需的结果(尽管顺序不同)。