Question

我试图弄清楚如何计算以数字开头的行，例如：

My_col

24 was 2020 - There is a lot -
23 aka 2018 -  how many ...
23 was 2020 - wonderful!
no numbers this time

，并且仅当以数字开头时，才删除-之前三个单词之前的单词：

My_col

There is a lot -
how many ...
wonderful!
no numbers this time

使用SQL，我将按以下方式进行检查：

SELECT CASE WHEN ISNUMERIC(SUBSTRING(LTRIM(My_Col), 1, 1)) = 1 
         THEN 'yes' 
         ELSE 'no' 
       END AS StartsWithNumber
FROM my_data

我认为要删除-之前的单词，应该考虑使用np.where或regex然后使用apply。

Answer 1

df = pd.DataFrame({'My_col': [
          "24 was 2020 - There is a lot -", 
          "no numbers this time"] })

df['My_col'].apply(
    lambda x: x[x.find("-")+1:].strip() if x[0].isdigit() else x)

输出：

0        There is a lot -
1    no numbers this time

Answer 2

使用df.replace()和正则表达式。我在第四行中添加了-，以显示并没有删除单词：

import pandas as pd

data = {'My_col':['24 was 2020 - There is a lot -', '23 aka 2018 -  how many ...', '23 was 2020 - wonderful!', 'no numbers this - time']}
df = pd.DataFrame(data)

df['My_col'].replace(r'^\d.*?-','', regex=True, inplace = True)
print(df)

                   My_col
0        There is a lot -
1            how many ...
2              wonderful!
3  no numbers this - time

从熊猫行中删除特定模式

2 个答案: