Question

我想从一列中的字符串中删除数字，同时在同一列中保留没有任何字符串的数字。数据就是这样；

df=
id       description
1         XG154LU
2         4562689
3         556
4         LE896E
5         65KKL4

这就是我希望输出看起来像的样子：

id       description
1         XGLU
2         4562689
3         556
4         LEE
5         KKL

我使用了下面的代码，但是当我运行它时，它将删除描述列中的所有条目，并将其替换为空白：

def clean_text_round1(text):
  text = re.sub('\w*\d\w*', '', text)
  text = re.sub('[‘’“”…]', '', text)
  text = re.sub(r'\n', '', text)
  text = re.sub(r'\r', '', text)
return text

round1 = lambda x: clean_text_round1(x)
df['description'] = df['description'].apply(round1)

Answer 1

尝试：

import numpy as np

df['description'] = np.where(df.description.str.contains('^\d+$'), df.description, df.description.str.replace('\d+', ''))

输出：

id       description
1         XGLU
2         4562689
3         556
4         LEE
5         KKL

逻辑：

查看字符串contains是否仅是数字，如果是，则不执行任何操作，仅复制数字即可。如果字符串中的数字与字符串混合在一起，则replace带有黑色空格''，仅保留没有数字的字符。

Answer 2

这应该为您解决。

def clean_text_round1(text):
    if type(text) == int:
        return text
    else:
        text = ''.join([i for i in text if not i.isdigit()])
        return text

df['description'] = df['description'].apply(clean_text_round1)

让我知道这是否适合您。不确定速度性能。您可以使用正则表达式代替加入。

Answer 3

def convert(v):
    # check if the string is composed of not only numbers
    if any([char.isalpha() for char in v]):     
        va = [char for char in v if char.isalpha()]
        va = ''.join(va)
        return va 
    else:        
        return v
# apply() a function for a single column
df['description']= df['description'].apply(convert)
print(df)

id  description
0        XGLU
1     4562689
2         556
3         LEE
4         KKL

从数据框列中的字符串中删除数字

3 个答案: