我试图弄清楚如何计算以数字开头的行,例如:
My_col
24 was 2020 - There is a lot -
23 aka 2018 - how many ...
23 was 2020 - wonderful!
no numbers this time
,并且仅当以数字开头时,才删除-
之前三个单词之前的单词:
My_col
There is a lot -
how many ...
wonderful!
no numbers this time
使用SQL,我将按以下方式进行检查:
SELECT CASE WHEN ISNUMERIC(SUBSTRING(LTRIM(My_Col), 1, 1)) = 1
THEN 'yes'
ELSE 'no'
END AS StartsWithNumber
FROM my_data
我认为要删除-
之前的单词,应该考虑使用np.where
或regex
然后使用apply
。
答案 0 :(得分:1)
df = pd.DataFrame({'My_col': [
"24 was 2020 - There is a lot -",
"no numbers this time"] })
df['My_col'].apply(
lambda x: x[x.find("-")+1:].strip() if x[0].isdigit() else x)
输出:
0 There is a lot -
1 no numbers this time
答案 1 :(得分:0)
使用df.replace()
和正则表达式。我在第四行中添加了-
,以显示并没有删除单词:
import pandas as pd
data = {'My_col':['24 was 2020 - There is a lot -', '23 aka 2018 - how many ...', '23 was 2020 - wonderful!', 'no numbers this - time']}
df = pd.DataFrame(data)
df['My_col'].replace(r'^\d.*?-','', regex=True, inplace = True)
print(df)
My_col
0 There is a lot -
1 how many ...
2 wonderful!
3 no numbers this - time