从熊猫行中删除特定模式

时间:2020-10-07 18:22:17

标签: python pandas

我试图弄清楚如何计算以数字开头的行,例如:

My_col

24 was 2020 - There is a lot -
23 aka 2018 -  how many ...
23 was 2020 - wonderful!
no numbers this time

,并且仅当以数字开头时,才删除-之前三个单词之前的单词:

My_col

There is a lot -
how many ...
wonderful!
no numbers this time

使用SQL,我将按以下方式进行检查:

SELECT CASE WHEN ISNUMERIC(SUBSTRING(LTRIM(My_Col), 1, 1)) = 1 
         THEN 'yes' 
         ELSE 'no' 
       END AS StartsWithNumber
FROM my_data 

我认为要删除-之前的单词,应该考虑使用np.whereregex然后使用apply

2 个答案:

答案 0 :(得分:1)

df = pd.DataFrame({'My_col': [
          "24 was 2020 - There is a lot -", 
          "no numbers this time"] })

df['My_col'].apply(
    lambda x: x[x.find("-")+1:].strip() if x[0].isdigit() else x)

输出:

0        There is a lot -
1    no numbers this time

答案 1 :(得分:0)

使用df.replace()和正则表达式。我在第四行中添加了-,以显示并没有删除单词:

import pandas as pd

data = {'My_col':['24 was 2020 - There is a lot -', '23 aka 2018 -  how many ...', '23 was 2020 - wonderful!', 'no numbers this - time']}
df = pd.DataFrame(data)

df['My_col'].replace(r'^\d.*?-','', regex=True, inplace = True)
print(df)

                   My_col
0        There is a lot -
1            how many ...
2              wonderful!
3  no numbers this - time