Question

我有一个字符串矩阵，其中包含某些公司的特定行业收入贡献。我必须提取一个只包含软件收入的矩阵。矩阵如下：

revenue <- data.frame(revenue = c("79% Software, 1% Hardware, 20% Services", NA, NA, "10.5% Software, 90% Services", "1.4% Software, 98.6% Services", "17% Software, 83% Services", NA, "100% Services", "47% Services, 39% Hardware, 14.32% Software"))

我想提供结束模式为“软件”，然后提取左边获取％，提取数字（无论是十进制还是数字）。

我的解决方案正在运行，但它很安静。如何在单行中提取矩阵。

修改

正如@SabDem在评论中所提出的那样，

我的代码：

library("stringr")
revenue= as.matrix(revenue)
rs <- str_split_fixed(revenue,',',3)
rs1<- matrix(0,nrow(rs), ncol(rs))
for(i in 1:nrow(rs)){
  for(j in 1:ncol(rs)){
    ifelse(grep('Software',rs[i,j])==TRUE,(rs1[i,j]=rs[i,j]),(rs1[i,j]=0)) 
  }
}
rs2 <- gsub('Software|%','',rs1)
soft.revenue <- rowSums(data.matrix(data.frame(rs2, stringsAsFactors = FALSE)))

Answer 1

我会使用stringr库。对于你的例子，它将是：

library("stringr")
revenue <- data.frame(revenue = c("79% Software, 1% Hardware, 20% Services", NA, NA, "10.5% Software, 90% Services", "1.4% Software, 98.6% Services", "17% Software, 83% Services", NA, "100% Services", "47% Services, 39% Hardware, 14.32% Software"))
pattern <- "(([[:digit:]]|.[[:digit:]]+)*)(?=% Software)"
as.numeric(str_extract(revenue$revenue,pattern))

核心思想是表达式(?=% Software)，它向前看，直到找到字符串% Software。后面的可变长度看起来（据我所知）在R.中是不可能的。

在R中向左提取一个给定变量长度的模式[可变长度后视]

1 个答案: