替换字符串中并非全部字母Pandas的短语

时间:2019-11-01 19:06:02

标签: pandas

我有一个熊猫系列

pd.Series({'products':['deskjet 2620 all in one wireless inkjet printer', 'z3700 wireless optical mouse white' ]})

我想从头到尾替换所有非字母的单词。我想要输出

pd.Series({'products':['deskjet all in one wireless inkjet printer', ' wireless optical mouse white' ]})

最有效的方法是什么?

3 个答案:

答案 0 :(得分:0)


# function to check if number or character
def replace_nums(x):
    no_digits=""
    for i in x:
        if not i.isdigit():
            no_digits+=i


    return no_digits

# eg series
exdict = {'Geeks' : '10abc', 
        'for' : '204f5', 
        'geeks' : '30rew'} 


ser = pd.Series(exdict) 

# apply function to series
ser = ser.apply(replace_nums)

输出


Geeks    abc
for        f
geeks    rew
dtype: object

答案 1 :(得分:0)

这里是您可以使用的另一种方法。告诉我速度与您遇到的其他方法相比如何。同样,这可能有点学究。我将研究仅使用numpy的一种,这样速度会更快。

import pandas as pd
import numpy as np

unclean = pd.Series({'products':['deskjet 2620 all in one wireless inkjet printer', 'z3700 wireless optical mouse white']})
clean = pd.Series({'products':['deskjet all in one wireless inkjet printer', ' wireless optical mouse white']})

convert = lambda product: list(filter(lambda good_val: len(set([str(i) for i in good_val]).intersection(set([str(i) for i in range(10)]))) < 1,product))

edited = unclean.apply(lambda val: [' '.join(convert(i.split())) for i in val])

print(unclean.values)
print(edited.values)

退出:

[list(['deskjet 2620 all in one wireless inkjet printer', 'z3700 wireless optical mouse white'])]
[list(['deskjet all in one wireless inkjet printer', 'wireless optical mouse white'])]

答案 2 :(得分:0)

我想通了。我的数据集不太大。我没有尝试其他答案,但是我得到的很好

import re

df = pd.DataFrame({'products':['deskjet 2620 all in one wireless inkjet printer', 'z3700 wireless optical mouse white']})

regex = re.compile(r'^[a-z]+$')

df['products_clean'] = [' '.join([j for j in i if regex.match(j)]) for i in df['products'].str.split(' ')]

df