我在python中有一个子字符串,例如“ Hello_world”,可以以“ hello_world”或“ HELLO_WORLD”或“ heLLo_World”等任何格式给出。字母的大写可能会有所变化。我有一个字符串,我正在上面的给定子字符串处拆分该字符串。我通过一些堆栈溢出问题来了解软件包re
(正则表达式)。我可以用它达到以上目的吗?
是否可以执行上述操作?
预先感谢
答案 0 :(得分:1)
在# load necessary packages -----
library(tidyverse)
# load necessary data --------
cloud <-
read_csv("Rainfall, Treatment
274.7, Seeded
274.7, Seeded
Seeded, 255
242.5, Seeded
200.7, Seeded
198.6, Seeded
129.6, Seeded
119, Seeded
118.3, Seeded
115.3, Seeded
92.4, Seeded
40.6, Seeded
32.7, Seeded
31.4, Seded
17.5, Seeded")
# store the misplaced text value
misplaced.text <-
cloud %>% pull(Rainfall) %>% str_subset("^\\D.*$")
# store the misplaced numeric value
misplaced.numeric <-
cloud %>% pull(Treatment) %>% str_subset("^\\d.*$")
# update cloud so that misplaced values are swapped -----
# and clean Treatment for mispellings
cloud.clean <-
cloud %>%
mutate(Rainfall = if_else(Rainfall %in% misplaced.text &
Treatment %in% misplaced.numeric
, misplaced.numeric
, Rainfall) %>% as.double()
, Treatment = if_else(Treatment %in% misplaced.numeric
, misplaced.text
, Treatment)
, Treatment = if_else(Treatment %in% "Seded"
, "Seeded"
, Treatment))
# view results ----
# note: tibble is only rounding the printed output in console
cloud.clean$Rainfall[1] # [1] 274.7
cloud.clean
# A tibble: 15 x 2
# Rainfall Treatment
# <dbl> <chr>
# 1 275. Seeded
# 2 275. Seeded
# 3 255 Seeded
# 4 242. Seeded
# 5 201. Seeded
# 6 199. Seeded
# 7 130. Seeded
# 8 119 Seeded
# 9 118. Seeded
# 10 115. Seeded
# 11 92.4 Seeded
# 12 40.6 Seeded
# 13 32.7 Seeded
# 14 31.4 Seeded
# 15 17.5 Seeded
# end of script #
中使用re.IGNORECASE
标志:
re.split()
答案 1 :(得分:0)
是的,您可以使用正则表达式解决此问题,而Python为此类任务提供了更为简单的字符串函数。这是命令行上的示例:
>>> my_string = 'There is a HeLLO_worLD in this string'
>>> 'hello_world' in my_string.lower()
True
答案 2 :(得分:0)
在比较之前将字符串转换为小写。如果s
是字符串,则在比较之前执行s.lower()
。
答案 3 :(得分:0)
因此,如果您要拆分,请使用re.split
:
import re
s = re.split(r"(?i)hello_world", "aaaa hELLo_worLd bbbb HELLo_woRld cccc")
print(s)
['aaaa ', ' bbbb ', ' cccc']