我有一个熊猫系列
pd.Series({'products':['deskjet 2620 all in one wireless inkjet printer', 'z3700 wireless optical mouse white' ]})
我想从头到尾替换所有非字母的单词。我想要输出
pd.Series({'products':['deskjet all in one wireless inkjet printer', ' wireless optical mouse white' ]})
最有效的方法是什么?
答案 0 :(得分:0)
# function to check if number or character
def replace_nums(x):
no_digits=""
for i in x:
if not i.isdigit():
no_digits+=i
return no_digits
# eg series
exdict = {'Geeks' : '10abc',
'for' : '204f5',
'geeks' : '30rew'}
ser = pd.Series(exdict)
# apply function to series
ser = ser.apply(replace_nums)
输出
Geeks abc
for f
geeks rew
dtype: object
答案 1 :(得分:0)
这里是您可以使用的另一种方法。告诉我速度与您遇到的其他方法相比如何。同样,这可能有点学究。我将研究仅使用numpy的一种,这样速度会更快。
import pandas as pd
import numpy as np
unclean = pd.Series({'products':['deskjet 2620 all in one wireless inkjet printer', 'z3700 wireless optical mouse white']})
clean = pd.Series({'products':['deskjet all in one wireless inkjet printer', ' wireless optical mouse white']})
convert = lambda product: list(filter(lambda good_val: len(set([str(i) for i in good_val]).intersection(set([str(i) for i in range(10)]))) < 1,product))
edited = unclean.apply(lambda val: [' '.join(convert(i.split())) for i in val])
print(unclean.values)
print(edited.values)
退出:
[list(['deskjet 2620 all in one wireless inkjet printer', 'z3700 wireless optical mouse white'])]
[list(['deskjet all in one wireless inkjet printer', 'wireless optical mouse white'])]
答案 2 :(得分:0)
我想通了。我的数据集不太大。我没有尝试其他答案,但是我得到的很好
import re
df = pd.DataFrame({'products':['deskjet 2620 all in one wireless inkjet printer', 'z3700 wireless optical mouse white']})
regex = re.compile(r'^[a-z]+$')
df['products_clean'] = [' '.join([j for j in i if regex.match(j)]) for i in df['products'].str.split(' ')]
df