Question

我正在废弃网站，结果，我清理了一半的代码：

[3] "2♠2:2♠2: Texas:28,,845:25,46,5:4.4%:36♠36:55,32:9,23:698,53:8.68%"*

以上是一个示例，我正在尝试删除该数字之前或之后的数字。

所需的输出为： [3]“ 2：2：德州：28，，845：25,46,5：4.4％：36：55,32：9,23：698,53：8.68％”

基本上删除心脏和结肠（包括心脏）之间的数字。我将不胜感激任何帮助。我已经尝试了以下代码，但没有用。

str_replace_all(dataSet, "♠*:", "", fixed = T) 
gsub("*♠", "", data, fixed = T)


website <- read_html("https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States_by_population")

results <- website %>% html_nodes("table")

data_body <- results[1] %>% html_nodes("tbody")
rows <- data_body %>% html_nodes("tr")

clean_rows_text <- str_replace_all(rows_text,"[7000100000000000000]", "")

clean_rows_text <- str_replace_all(clean_rows_text, "\n\n", ":")

clean_rows_text <- str_replace_all(clean_rows_text, "\n", "")

所需的输出是： [3] "2:2: Texas:28,,845:25,46,5:4.4%:36:55,32:9,23:698,53:8.68%”

至此，我将处理其余的工作。

Answer 1

这应该做到：

data <- "2♠2:2♠2: Texas:28,,845:25,46,5:4.4%:36♠36:55,32:9,23:698,53:8.68%*"  
gsub("♠.+?(?=:)", "", data, perl=T)

使用正则表达式和/或删除重复项

1 个答案: