如何标记唯一值?

时间:2019-10-07 22:30:36

标签: python pandas dataframe label pandas-groupby

我有包含48000行作为输入文本和答案的数据集。有89个唯一的答案值。我如何从文本答案标签答案中得出答案,例如1个唯一值等于answer1,第二个等于answer2,依此类推,直到答案89。


> x               y                y_val
> hello     please push button 1   answer1
> what's up please push button 1   answer1
> be cool   please push button 1   answer1
>smth       please push button 1   answer1
>write num  please push button 1   answer1 
>hello      please push button 1   answer1
>what's up  please push button 1   answer1
>be cool        sure               answer2
>smth       sure                   answer2
>write num  sure                   answer2
>hello      sure                   answer2
> what's up perfect                answer3
> be cool   perfect                answer3
>smth       call me                answer89
>write num  call me                answer89

================================================ =======================

我想更改“请按按钮1”将变成答案1,“确定”将变成答案2。我有89个唯一值,所以我需要所有这些值都进行更改,以使y_values变成仅包含answer1-answer89的列。 / p>

1 个答案:

答案 0 :(得分:0)

我有点困惑,您是否只想将重新编码的列附加到数据框中,以将您的“ y”列值标记为answer1-answer89?

如果是,此代码将为您做到这一点:

seen = set()
y_val = []
x = list(range(1,50))

for i in range(len(data)):
    if any((str(data.iloc[i,1]) == y) for y in seen):
        y_val.append(y_val[-1])
    else:
        y_val.append('answer'+str(x[0]))
        seen.add(str(data.iloc[i,1]))
        x.pop(0)

data['y_values'] = y_val
print(data)

这种处理方式假设数据按“ y”列按字母顺序排序,并且您可以按照该顺序进行重新编码。只需将“ data”替换为熊猫数据集的名称,并确保iloc适用于您的列。我敢肯定,有一种更有效或更pythonic的方式来做到这一点,但这就是我的想法。

希望这对您有帮助!