我有一个字符串列表
x=['llc', 'corp', 'sa']
我需要过滤我的数据框中包含字符串的列的结尾:
df = pd.DataFrame(['Geeks corp', 'toto', 'tete coope', 'tete sa', 'tata corp', 'titi', 'tmtm'] , columns =['Names'])
作为我想要的输出。有:
list = ['Geeks', 'toto', 'tete coope', 'tete', 'tata', 'titi', 'tmtm']
您有什么建议?
答案 0 :(得分:1)
将Series.str.replace
与正则表达式一起使用-在字符串的匹配末尾添加了$
,在匹配空间之前添加了\s+
,并在正则表达式|
中加入了or
:
pat = '|'.join(f'\s+{y}$' for y in x)
df['Names'] = df['Names'].str.replace(pat, '')
print (df)
Names
0 Geeks
1 toto
2 tete coope
3 tete
4 tata
5 titi
6 tmtm
答案 1 :(得分:0)
此解决方案将起作用
Lines <- "iso3year UHC cata10
AFG 2010 0.3551409 NA
AFG 2011 0.3496452 NA
AFG 2012 0.3468012 NA
AFG 2013 0.3567721 14.631331
AFG 2014 0.3647436 NA
AFG 2015 0.3717983 NA
AFG 2016 0.3855273 4.837534
AFG 2017 0.3948606 NA
AGO 2011 0.3250651 12.379809
AGO 2012 0.3400455 NA
AGO 2013 0.3397722 NA
AGO 2014 0.3385741 NA
AGO 2015 0.3521086 16.902584
AGO 2016 0.3636765 NA
AGO 2017 0.3764945 NA"
DF <- read.csv(text = gsub(" +", ",", Lines), as.is = TRUE)