我已经定义了一个函数:
def keep_alphabets(name):
energy[name] = energy[name].map(lambda x : ' '.join([re.sub('[^A-Za-z]','',w) for w in x.split()]))
我有一个现有的数据框,我正在使用操作员链接。
energy = (pd.read_excel('Energy Indicators.xls',skiprows=17, skip_footer=0,na_values='...')
.drop(['Unnamed: 0','Unnamed: 1'], axis=1)
.rename(columns = {'Unnamed: 2' : 'Country','Petajoules' : 'Energy Supply','Gigajoules' : 'Energy Supply per Capita',
'%' : '% Renewable'})
.replace({'Country':{"Republic of Korea": "South Korea",
"United States of America": "United States",
"United Kingdom of Great Britain and Northern Ireland": "United Kingdom",
"China, Hong Kong Special Administrative Region3": "Hong Kong"}})
.head(227))
我可以在这里添加keep_alphabets功能吗?
答案 0 :(得分:0)
apply
的每一列使用lambda函数, IIUC的最后一步应为df
:
.apply(lambda x : ' '.join([re.sub('[^A-Za-z]','',w) for w in x.split()]), axis=1)
答案 1 :(得分:0)
您可以执行以下操作..
# for single elements
def keep_alphabets_elem(s):
return ' '.join([re.sub('[^A-Za-z]','',w) for w in s.split()]))
energy = (pd.read_excel('Energy Indicators.xls',skiprows=17, skip_footer=0,na_values='...')
.drop(['Unnamed: 0','Unnamed: 1'], axis=1)
.rename(columns = {'Unnamed: 2' : 'Country','Petajoules' : 'Energy Supply','Gigajoules' : 'Energy Supply per Capita',
'%' : '% Renewable'})
.replace({'Country':{"Republic of Korea": "South Korea",
"United States of America": "United States",
"United Kingdom of Great Britain and Northern Ireland": "United Kingdom",
"China, Hong Kong Special Administrative Region3": "Hong Kong"}})
.apply(lambda x: keep_alphabets_elem(x['COL_NAME'], axis=1) # NEW
.head(227))
请注意,您需要使用axis = 1在行而不是列上执行此操作。
答案 2 :(得分:0)
如果您只想修改名为'col'
的单个列:
.assign(col=energy['col'].map(func))
其中func
是您定义的lambda函数:
def func(x):
return ' '.join([re.sub('[^A-Za-z]','',w) for w in x.split()])
如果要将列名称放在变量name='col'
中:
.assign(**{name: energy[name].map(func)})