Question

我正在清理一些数据，并且整个单元格中都有要删除的脚注编号。行名中也有使用数字的单元格，所以我不能只提取单词。

data <- data.frame(Characteristic =  c('Race3 and Origin', 'Sex','Age 18 to
45', 'Age 55 and older'), Number =  c(40, 50, 60, 1), Margin4 = c(12, 22, 5,
1))

data$Characteristic <- as.character(data$Characteristic)

我最近尝试了多种模式：

df$Characteristic <- str_extract_all(df$Characteristic, "([:alpha:]* 
[:space:]?\\d{2,})|([:alpha:]*)|[:space:]")

但这给我留下了<chr [2]>

的列表

完全没有做str_extract会给我第一个单词。

我想念什么？

Answer 1

这是您想要的吗？

sub("([a-zA-Z]*)[0-9]*(\\s*\\s)","\\1\\2"  , data$C)

[1] "Race and Origin"  "Sex"              "Age 18 to\n45"    "Age 55 and older"

Answer 2

您可以使用以下方式删除粘贴到字母（在单词末尾）的所有数字

from selenium.common.exceptions import TimeoutException

driver.set_page_load_timeout(10)
try:
    driver.get("https://finance.yahoo.com/quote/{}/history?period1={}&period2={}&interval=1d&filter=history&frequency=1d"
           .format(ticker, period1, period2))
except TimeoutException:
    driver.execute_script("window.stop();")
driver.find_element_by_link_text('Download Data').click()

或

data$Characteristic <- gsub("(?<=\\p{L})\\d+\\b", "", data$Characteristic, perl=TRUE)

模式匹配

library(stringr) data$Characteristic <- str_replace_all(data$Characteristic, "(?<=\\p{L})\\d+\\b", "")-紧随字母之后的任何位置
(?<=\\p{L})-1个或更多数字
\\d+-单词边界。

R删除附加号码

2 个答案:

请参见regex demo