我的功能如下:
HistolMacDescrip <- function(dataframe, MacroColumn) {
dataframe <- data.frame(dataframe)
# Column specific cleanup
dataframe[, MacroColumn] <- str_replace(dataframe[, MacroColumn],
"[Dd]ictated by.*", "")
# Conversion of text numbers to allow number of biopsies to be extracted
dataframe[, MacroColumn] <- str_replace(dataframe[, MacroColumn],
"[Oo]ne", "1")
dataframe[, MacroColumn] <- str_replace(dataframe[, MacroColumn],
"[Ss]ingle", "1")
dataframe[, MacroColumn] <- str_replace(dataframe[, MacroColumn],
"[Tt]wo", "2")
dataframe[, MacroColumn] <- str_replace(dataframe[, MacroColumn],
"[Tt]hree", "3")
dataframe[, MacroColumn] <- str_replace(dataframe[, MacroColumn],
"[Ff]our", "4")
dataframe[, MacroColumn] <- str_replace(dataframe[, MacroColumn],
"[Ff]ive", "5")
dataframe[, MacroColumn] <- str_replace(dataframe[, MacroColumn],
"[Ss]ix", "6")
dataframe[, MacroColumn] <- str_replace(dataframe[, MacroColumn],
"[Ss]even", "7")
dataframe[, MacroColumn] <- str_replace(dataframe[, MacroColumn],
"[Ee]ight", "8")
return(dataframe)
}
我认为重复的数量有点荒谬,代码可能更整洁。我想知道是否可以使用键值查找和替换行。
其中一个问题是输入可能包含多个文字编号,因此一旦找到第一个匹配项,查找和替换就不会停止。
示例输入
d<-c("There are two specimens","Three exist here","Two three four")
d<-data.frame(d)
示例输出
"There are 2 specimens",
"3 exist here",
"2 3 4"
答案 0 :(得分:0)
我们可以将english
与gsubfn
library(english)
library(gsubfn)
sub("^([a-z])", "\\U\\1", gsubfn("\\w+", setNames(as.list(2:4),
as.character(english(2:4))), tolower(as.character(d$d))), perl = TRUE)
#[1] "There are 2 specimens" "3 exist here" "2 3 4"