我有一堆像这样的字符串:
x <- c("4/757.1%", "0/10%", "6/1060%", "0/0-%", "11/2055%")
它们是分数和所述分数的百分比价值,某种程度上某些地方被合并在一起。因此,示例中第一个数字的含义是7个中的4个是57.1%。我可以很容易地得到/之前的第一个数字(比如stringr::word(x, 1, sep = "/")
),但第二个数字可以是一个或两个字符长,所以我很难想办法。我不需要%值,因为一旦我得到数字就很容易重新计算。
有人能看到一种方法吗?
答案 0 :(得分:0)
正如您所指出的,一旦您获得分数,百分比就可以重新计算出来。你能利用这个事实来找出分裂的位置吗?
GuessSplit <- function(string) {
tolerance <- 0.001 #How close should the fraction be?
numerator <- as.numeric(word(string, 1, sep = "/"))
second.half <-word(string, 2, sep = "/")
second.half <- strsplit(second.half, '')[[1]]
# assuming they all end in percent signs
possibilities <- length(second.half) - 1
for (position in 1:possibilities) {
denom.guess <- as.numeric(paste0(second.half[1:position], collapse=''))
percent.guess <- as.numeric(paste0(second.half[(position+1):possibilities], collapse='')) / 100
value <- numerator / denom.guess
if (abs(value - percent.guess) < tolerance) {
return(list(numerator=numerator, denominator=denom.guess))
}
}
}
这需要一点爱来处理怪异的情况,如果它找不到可能性的答案,可能会更优雅。我也不确定哪种返回类型最好。也许你只需要分母,因为分子很容易获得,但我认为两者的列表最为一般。我希望这是一个合理的开始?
答案 1 :(得分:0)
一种看起来像你想做的那种丑陋的解决方案:
x <- c("4/757.1%", "0/10%", "6/1060%", "0/0-%", "11/2055%")
split_perc <- function(x,signif_digits=1){
x = gsub("%","",x)
if(grepl("-",x)) return(list(NA,NA))
index1 = gregexpr("/",x)[[1]][1]+1
index2 = gregexpr("\\.",x)[[1]][1]-2
if(index2==-3){index2=nchar(x)-1}
found=FALSE
indices = seq(index1,index2)
k=1
while(!found & k<=length(indices))
{
str1 =substr(x,1,indices[k])
num1=as.numeric(strsplit(str1,"/")[[1]][1])
num2 = as.numeric(strsplit(str1,"/")[[1]][2])
value1 = round(num1/num2*100,signif_digits)
value2 = round(as.numeric(substr(x,indices[k]+1,nchar(x))),signif_digits)
if(value1==value2)
{found=TRUE}
else
{k=k+1}
}
if(found)
return(list(num1,num2))
else
return(list(NA,NA))
}
do.call(rbind,lapply(x,split_perc))
输出:
[,1] [,2]
[1,] 4 7
[2,] 0 1
[3,] 6 10
[4,] NA NA
[5,] 11 20
还有一些例子:
y = c("11/2055.003%","11/2055.2%","40/7057.1%")
do.call(rbind,lapply(y,split_perc))
[,1] [,2]
[1,] 11 20 # default significant digits is 1, so match found.
[2,] NA NA # no match found since 55.1!=55.2
[3,] 40 70
答案 2 :(得分:0)
来自tidyverse
和stringr
的解决方案。我们可以定义一个函数来分割第二个数字的所有可能位置,并计算百分比以查看哪个有意义。 df2
是显示最佳分割位置的数据框,您想要的数字位于V3
列。
library(tidyverse)
library(stringr)
x <- c("4/757.1%", "0/10%", "6/1060%", "0/0-%", "11/2055%")
dt <- str_split_fixed(x, pattern = "/", n = 2) %>%
as_data_frame() %>%
mutate(ID = 1:n()) %>%
select(ID, V1, V2)
# Design a function to spit the second column based on position
split_df <- function(position, dt){
dt_temp <- dt %>%
mutate(V3 = str_sub(V2, 1, position)) %>%
mutate(V4 = str_sub(V2, position + 1)) %>%
mutate(Pos = position)
return(dt_temp)
}
# Process the data
dt2 <- map_df(1:3, split_df, dt = dt) %>%
# Remove % in V4
mutate(V4 = str_replace(V4, "%", "")) %>%
# Convert V1, V3 and V4 to numeric
mutate_at(vars(V1, V3, V4), funs(as.numeric)) %>%
# Calculate possible percentage
mutate(V5 = V1/V3 * 100) %>%
# Calculate the difference between V4 and V5
mutate(V6 = abs(V4 - V5)) %>%
# Select the smallest difference based on V6 for each group
group_by(ID) %>%
arrange(ID, V6) %>%
slice(1)
# The best split is now in V3
dt2$V3
[1] 7 1 10 0 20