Question

我正在尝试做一些机器学习练习，但我的数据框的ID列给了我麻烦。我有这个：

0    LP001002
1    LP001003
2    LP001005
3    LP001006
4    LP001008

我想要这个：

0    001002
1    001003
2    001005
3    001006
4    001008

我的想法是使用replace函数ID.replace('[LP]', '', inplace=True)，但这实际上并没有改变系列。任何人都知道转换此列的好方法吗？

Answer 1

您可以使用replace

df
Out[656]: 
        Val
0  LP001002
1  LP001003
2  LP001005
3  LP001006
4  LP001008
df.Val.replace({'LP':''},regex=True)
Out[657]: 
0    001002
1    001003
2    001005
3    001006
4    001008
Name: Val, dtype: object

Answer 2

以下是给出的示例：

import pandas as pd
df = pd.DataFrame({'colname': ['LP001002', 'LP001003']})

# Slice off the 0th and 1st character of the string
df['colname'] = [x[2:] for x in df['colname']]

如果这是您的索引，您可以通过df['my_index'] = df.index访问它，然后按照其余说明操作。

一般情况下，您可以考虑使用scikit中label encoder之类的内容来学习将非数字元素转换为数字元素。

Pandas - 用int替换脏字符串

2 个答案: