Question

我的目标是获取列值的组合。例如，

   UT    Fruit_1 Fruit_2 Fruit_3
0  I1      Apple  Orange   Peach
1  I2      Apple   Lemon     NaN
2  I3  Starfruit   Apple  Orange

在此数据框中，我想合并Fruit_ *列的值。因此，结果是（Apple，Orange），（Apple，Peach），（Orange，Peach）...

如您所见，数据框具有NaN。因此，在组合工作之后，我将删除带有特定文本的行：“ nan”。通过阅读与此任务相关的一些帖子，我编写了以下代码。

import pandas as pd
import numpy as np
from itertools import combinations

df = pd.DataFrame([['I1', 'Apple', 'Orange', 'Peach'],
                   ['I2', 'Apple', 'Lemon', np.NAN],
                   ['I3', 'Starfruit', 'Apple', 'Orange']],
                  columns=['UT', 'Fruit_1', 'Fruit_2', 'Fruit_3'])

temp1 = df.set_index ('UT')
temp2 = temp1.apply (lambda x: list (combinations (x, 2)), 1)
temp3 = temp2.apply (lambda x: pd.Series (x))
temp4 = temp3.stack ().reset_index (level = [0, 1])
del temp4['level_1']
temp4.columns = ['UT', 'pair']
temp4[~temp4.pair.str.contains('nan')]

但是，运行这段代码后，我收到一条错误消息：

TypeError：输入类型不支持ufunc'invert'，并且根据强制转换规则“ safe”不能将输入安全地强制转换为任何受支持的类型

如何解决此错误？

Answer 1

对于大熊猫0.25，可以使用Series.explode并通过NaNs过滤掉combinations中的list comprehension with filter技巧，因为它np.NaN != np.NaN通过definition：

df = pd.DataFrame([['I1', 'Apple', 'Orange', 'Peach'],
                   ['I2', 'Apple', 'Lemon', np.NAN],
                   ['I3', 'Starfruit', 'Apple', 'Orange']],
                  columns=['UT', 'Fruit_1', 'Fruit_2', 'Fruit_3'])

temp4 = (df.set_index ('UT')
           .apply (lambda x: list (combinations ([y for y in x if y == y], 2)), 1)
           .explode()
           .reset_index(name='pair'))

print (temp2)
   UT                 pair
0  I1      (Apple, Orange)
1  I1       (Apple, Peach)
2  I1      (Orange, Peach)
3  I2       (Apple, Lemon)
4  I3   (Starfruit, Apple)
5  I3  (Starfruit, Orange)
6  I3      (Apple, Orange)

对于较早的熊猫版本：

temp4 = (df.set_index ('UT')
          .stack()
          .groupby(level=0)
          .apply(lambda x: pd.Series(list(combinations (x, 2))))
          .reset_index(level=1, drop=True)
          .reset_index(name='pair'))

print (temp4)
   UT                 pair
0  I1      (Apple, Orange)
1  I1       (Apple, Peach)
2  I1      (Orange, Peach)
3  I2       (Apple, Lemon)
4  I3   (Starfruit, Apple)
5  I3  (Starfruit, Orange)
6  I3      (Apple, Orange)

删除具有特定文本的行

1 个答案: