在python 3大熊猫中将分类变量聚合为一个变量

时间:2019-05-30 18:46:17

标签: python-3.x pandas

我有一个4列的pandas数据框。像这样

ID  col1                        col2                col3
1   Strongly Positive   Strongly Positive   Weekly Positive
2   Strongly Positive   Strongly Positive   Neutral
3   Strongly Negative   Strongly Negative   Weekly Negative
4   Weekly Negative      Strongly Negative  Neutral
5   Neutral              Neutral            Neutral
6   Strongly Positive   Strongly Negative   Strongly Negative
7   Strongly Negative   Weekly Positive     Neutral
8   Neutral               Weekly Negative   Weekly Positive

每个列可以采用类似值(强正,周正,中性,周负和强负以及ID列)。 我需要使用这些逻辑创建一个新列

  1. 如果所有列均具有正值,或者至少具有一个正值和两个中性值,则将该记录在新列中汇总为正值
  2. 如果三列中的所有列均为中性值,则将其标记为中性
  3. 如果所有列的值均为负值或至少一个负值和两个中性值,则将其标记为负值
  4. 如果正值和负值都为三,则将其标记为两者 积极是指强劲或每周积极,消极也同样。

我需要最终的数据帧像这样

ID  col1                  col2          col3        Aggregated_Col
1   Strongly Positive  Strongly Positive Weekly Positive    Positive
2   Strongly Positive  Strongly Positive Neutral            Positive
3   Strongly Negative  Strongly Negative Weekly Negative    Negative
4   Weekly Negative    Strongly Negative Neutral            Negative
5   Neutral            Neutral       Neutral                Neutral
6   Strongly Positive  Strongly Negative Strongly Negative  Both
7   Strongly Negative  Weekly Positive   Neutral            Both
8   Neutral Weekly     Negative      Weekly Positive        Both

无法思考逻辑

ID  col1                  col2          col3        Aggregated_Col
1   Strongly Positive  Strongly Positive Weekly Positive    Positive
2   Strongly Positive  Strongly Positive Neutral            Positive
3   Strongly Negative  Strongly Negative Weekly Negative    Negative
4   Weekly Negative    Strongly Negative Neutral            Negative
5   Neutral            Neutral       Neutral            Neutral
6   Strongly Positive  Strongly Negative Strongly Negative  Both
7   Strongly Negative  Weekly Positive   Neutral            Both
8   Neutral Weekly     Negative      Weekly Positive    Both

4 个答案:

答案 0 :(得分:0)

我建议将这些值重新编码为整数,例如

recode = {"Strongly Positive": 2, "Weakly Positive": 1, "Neutral": 0, "Weakly Negative": -1, "Strongly Negative": -2}

然后,您可以编写如下函数:

def interpret(values):
  if min(values) >= 0:
    return 1
  elif ...

并使用df.apply(interpret, axis=1)

调用它

答案 1 :(得分:0)

您有3列,因此可以使用

 $fichier=Storage::get('storage/avatarDebutFemme.png');

轴1告诉您要对行执行操作。将您的自定义逻辑编写为函数:

DF.apply(YourCustomFunction, axis=1)

如此过去

DF ['NewCol'] = DF.apply(MyFunction,axis = 1)

可以解决问题。请注意,传递给该函数的x将是一个数组,因此您必须在函数内正确索引它。

答案 2 :(得分:0)

您可以这样屏蔽各个元素:

# set index as ID:
df.set_index('ID', inplace=True)

has_pos = df.apply(lambda x: x.str.contains('Positive')).any(axis=1)
has_neg = df.apply(lambda x: x.str.contains('Negative')).any(axis=1)
has_both = has_pos & has_neg

# update
df['Agg_Col'] = 'Neutral'
df.loc[has_pos,'Agg_Col'] = 'Positive'
df.loc[has_neg,'Agg_Col'] = 'Negative'

df.loc[has_both,'Agg_Col'] = 'Both'

答案 3 :(得分:0)

有趣的现成解决方案

from numpy.core.defchararray import find

a = df.to_numpy().astype(str)
b = np.select([find(a, 'Pos') >= 0, find(a, 'Neg') >= 0], [1, -1], 0)

c = np.select(
    [(b == 0).all(1), (b >=0).all(1), (b <= 0).all(1)],
    ['Neutral', 'Positive', 'Negative'],
    'Both'
)

df.assign(Agg=c)

                 col1               col2               col3       Agg
ID                                                                   
1   Strongly Positive  Strongly Positive    Weekly Positive  Positive
2   Strongly Positive  Strongly Positive            Neutral  Positive
3   Strongly Negative  Strongly Negative    Weekly Negative  Negative
4     Weekly Negative  Strongly Negative            Neutral  Negative
5             Neutral            Neutral            Neutral   Neutral
6   Strongly Positive  Strongly Negative  Strongly Negative      Both
7   Strongly Negative    Weekly Positive            Neutral      Both
8             Neutral    Weekly Negative    Weekly Positive      Both

略有不同的举动

from numpy.core.defchararray import find

a = df.to_numpy().astype(str)
b = np.select([find(a, 'Pos') >= 0, find(a, 'Neg') >= 0], [1, -1], 0)

m = {
    (0, 0): 'Neutral', (1, -1): 'Both',
    (1, 1): 'Positive', (1, 0): 'Positive',
    (-1, -1): 'Negative', (0, -1): 'Negative',
}

df.assign(Agg=[*map(m.get, zip(b.max(1), b.min(1)))])

                 col1               col2               col3       Agg
ID                                                                   
1   Strongly Positive  Strongly Positive    Weekly Positive  Positive
2   Strongly Positive  Strongly Positive            Neutral  Positive
3   Strongly Negative  Strongly Negative    Weekly Negative  Negative
4     Weekly Negative  Strongly Negative            Neutral  Negative
5             Neutral            Neutral            Neutral   Neutral
6   Strongly Positive  Strongly Negative  Strongly Negative      Both
7   Strongly Negative    Weekly Positive            Neutral      Both
8             Neutral    Weekly Negative    Weekly Positive      Both