从列中删除带冒号的单词 - 为什么它不起作用?

时间:2018-03-13 20:54:23

标签: python regex string pandas

这是我的熊猫数据框

Description                        ID       Date
wa119:d Here comes the first row   id_112   2018/03/02
ax21:3 Here comes the second row   id_115   2018/03/02
bC230:13 Here comes the third row  id_234   2018/03/02

数据类型是

print(df.dtypes)

Description             object
ID                      object
Date                    datetime64[ns]
dtype: object

我想删除那些包含冒号的单词。在这种情况下,这将是wa119:d,ax21:3和bC230:13,以便我的新数据集应如下所示:

Description                ID      Date
Here comes the first row   id_112  2018/03/02
Here comes the second row  id_115  2018/03/02
Here comes the third row   id_234  2018/03/02

我尝试的是以下但没有一个有效:

re.sub('^\\w+:\\w+', '', df["Description"].astype(str))
re.sub('^\\w+:\\w+', '', df["Description"].astype("str"))

我收到以下错误消息:

Traceback (most recent call last):
  File "C:/Users/fff/PycharmProjects/Test/Test.py", line 17, in <module>
    re.sub('^\\w+:\\w+', '', df["Description"].astype("str"))
  File "C:\Users\fff\AppData\Local\Programs\Python\Python36-32\lib\re.py", line 191, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

有人可以帮忙吗?

1 个答案:

答案 0 :(得分:4)

以下作品:

df['Description'] = df["Description"].str.replace(r'^\w+:\w+', '')


>>> df
                  Description      ID        Date
0    Here comes the first row  id_112  2018/03/02
1   Here comes the second row  id_115  2018/03/02
2    Here comes the third row  id_234  2018/03/02