Question

我有一个字符串列表

x=['llc', 'corp', 'sa']

我需要过滤我的数据框中包含字符串的列的结尾：

df = pd.DataFrame(['Geeks corp', 'toto', 'tete coope', 'tete sa', 'tata corp', 'titi', 'tmtm'] , columns =['Names'])

作为我想要的输出。有：

list = ['Geeks', 'toto', 'tete coope', 'tete', 'tata', 'titi', 'tmtm']

您有什么建议？

Answer 1

将Series.str.replace与正则表达式一起使用-在字符串的匹配末尾添加了$，在匹配空间之前添加了\s+，并在正则表达式|中加入了or：

pat = '|'.join(f'\s+{y}$' for y in x)
df['Names'] = df['Names'].str.replace(pat, '')
print (df)
        Names
0       Geeks
1        toto
2  tete coope
3        tete
4        tata
5        titi
6        tmtm

Answer 2

此解决方案将起作用

Lines <- "iso3year    UHC         cata10
AFG 2010    0.3551409   NA
AFG 2011    0.3496452   NA
AFG 2012    0.3468012   NA
AFG 2013    0.3567721   14.631331
AFG 2014    0.3647436   NA
AFG 2015    0.3717983   NA
AFG 2016    0.3855273   4.837534
AFG 2017    0.3948606   NA
AGO 2011    0.3250651   12.379809
AGO 2012    0.3400455   NA
AGO 2013    0.3397722   NA
AGO 2014    0.3385741   NA
AGO 2015    0.3521086   16.902584
AGO 2016    0.3636765   NA
AGO 2017    0.3764945   NA"
DF <- read.csv(text = gsub("  +", ",", Lines), as.is = TRUE)

根据要删除的字符串列表删除字符串末尾的子字符串

2 个答案: