Question

我在python中有一个子字符串，例如“ Hello_world”，可以以“ hello_world”或“ HELLO_WORLD”或“ heLLo_World”等任何格式给出。字母的大写可能会有所变化。我有一个字符串，我正在上面的给定子字符串处拆分该字符串。我通过一些堆栈溢出问题来了解软件包re（正则表达式）。我可以用它达到以上目的吗？

是否可以执行上述操作？

预先感谢

Answer 1

在# load necessary packages ----- library(tidyverse) # load necessary data -------- cloud <- read_csv("Rainfall, Treatment 274.7, Seeded 274.7, Seeded Seeded, 255 242.5, Seeded 200.7, Seeded 198.6, Seeded 129.6, Seeded 119, Seeded 118.3, Seeded 115.3, Seeded 92.4, Seeded 40.6, Seeded 32.7, Seeded 31.4, Seded 17.5, Seeded") # store the misplaced text value misplaced.text <- cloud %>% pull(Rainfall) %>% str_subset("^\\D.*$") # store the misplaced numeric value misplaced.numeric <- cloud %>% pull(Treatment) %>% str_subset("^\\d.*$") # update cloud so that misplaced values are swapped ----- # and clean Treatment for mispellings cloud.clean <- cloud %>% mutate(Rainfall = if_else(Rainfall %in% misplaced.text & Treatment %in% misplaced.numeric , misplaced.numeric , Rainfall) %>% as.double() , Treatment = if_else(Treatment %in% misplaced.numeric , misplaced.text , Treatment) , Treatment = if_else(Treatment %in% "Seded" , "Seeded" , Treatment)) # view results ---- # note: tibble is only rounding the printed output in console cloud.clean$Rainfall[1] # [1] 274.7 cloud.clean # A tibble: 15 x 2 # Rainfall Treatment # <dbl> <chr> # 1 275. Seeded # 2 275. Seeded # 3 255 Seeded # 4 242. Seeded # 5 201. Seeded # 6 199. Seeded # 7 130. Seeded # 8 119 Seeded # 9 118. Seeded # 10 115. Seeded # 11 92.4 Seeded # 12 40.6 Seeded # 13 32.7 Seeded # 14 31.4 Seeded # 15 17.5 Seeded # end of script #中使用re.IGNORECASE标志：

re.split()

Answer 2

是的，您可以使用正则表达式解决此问题，而Python为此类任务提供了更为简单的字符串函数。这是命令行上的示例：

>>> my_string = 'There is a HeLLO_worLD in this string'
>>> 'hello_world' in my_string.lower()
True

Answer 3

在比较之前将字符串转换为小写。如果s是字符串，则在比较之前执行s.lower()。

Answer 4

因此，如果您要拆分，请使用re.split：

import re

s = re.split(r"(?i)hello_world", "aaaa hELLo_worLd bbbb HELLo_woRld cccc")
print(s)

['aaaa ', ' bbbb ', ' cccc']

python获取各种字符串

4 个答案: