如何一起提取和汇总列表中的字符元素

时间:2017-08-28 21:38:03

标签: r

我有一个列表(这是数据框中的一列),其中包含如下字符串:

@pytest.override_fixture('resource', 'broken_resource')
def test_service_when_resource_broken(service):
    # service should have been instantiated with broken_resource instead of resource.
    assert service.status == "bad"

此列表是从

行生成的

list("4 pieces of tissue, the largest measuring 4 x 3 x 2 m", NA_character_, NA_character_, "4 pieces of tissue, the largest measuring 4 x 2 x 2m", "2 pieces of tissue, the larger measuring 4 x 2 x 2 m", c("4 pieces of tissue, the largest measuring 5 x 4 x 2 m", "4 pieces of tissue, the largest measuring 6 x 2 x 1 m", "4 pieces of tissue, the largest measuring 4 x 3 x 1 m"), NA_character_, c("4 pieces of tissue, the largest measuring 4 x 3 x 2 m", "4 pieces of tissue, the largest measuring 5 x 2 x 2 m", "4 pieces of tissue, the largest measuring 4 x 2 x 1 m"), NA_character_, "4 pieces of tissue, the largest measuring 8 x 2 x 2m") 作为以下功能的一部分

我想提取列表中每个元素的组织块数之和。我一直在努力:

x$NumbOfBx <- str_extract_all(x[,y], "([A-Za-z]*|[0-9]) (specimens|pieces).*?(([0-9]).*?x.*?([0-9]).*?x.*?([0-9])).*?([a-z])")

但我收到了错误

function(x,y) {
  x<-data.frame(x)
      x$NumbOfBx <- str_extract_all(x[,y], "([A-Za-z]*|[0-9]) (specimens|pieces).*?(([0-9]).*?x.*?([0-9]).*?x.*?([0-9])).*?([a-z])")
      x$NumbOfBx <- sapply(x$NumbOfBx, function(x) sum(as.numeric(unlist(str_extract_all(x$NumbOfBx, "^\\d+")))))

  x$NumbOfBxs <- unlist(x$NumbOfBx)
  x$NumbOfBx <- as.numeric(str_extract(x$NumbOfBx, "^.*?\\d"))
  return(x)
}

2 个答案:

答案 0 :(得分:1)

这样的东西?简而言之,假设您的数据是一个列表,您可以提取单词sample | samples之前的数值,将其转换为数字,然后汇总列表中包含的每个向量中的计数。这是你提出的相同策略,只需进行一些修改......

# Assuming your list is defined as my.list

xtr.pieces <- function(ml) {
  my.sums <- lapply(ml, (function(el){
    sum (sapply(el, (function(tmp){
      if (!is.na(tmp)) {
        loc <- regexpr("[0-9]{1,2}.{0,3}[sample|specimen]", tmp)
        if (loc > 0) {
          tmp <- substr(tmp, loc, loc + attributes(loc)$match.length)
          as.numeric(gsub("[^[:digit:]]", "", tmp))
        }
      } else {
        0
      }
    })))
  }))
  return (my.sums)
}

这里的NA被算作0。你可以执行,然后得到:

unlist(xtr.pieces(ml))
[1]  4  0  0  4  2 12  0 12  0  4

答案 1 :(得分:1)

数据

L <- list("4 pieces of tissue, the largest measuring 4 x 3 x 2 m", 
NA_character_, NA_character_, "4 pieces of tissue, the largest measuring 4 x 2 x 2m", 
"2 pieces of tissue, the larger measuring 4 x 2 x 2 m", c("4 pieces of tissue, the largest measuring 5 x 4 x 2 m", 
"4 pieces of tissue, the largest measuring 6 x 2 x 1 m", 
"4 pieces of tissue, the largest measuring 4 x 3 x 1 m"), 
NA_character_, c("4 pieces of tissue, the largest measuring 4 x 3 x 2 m", 
"4 pieces of tissue, the largest measuring 5 x 2 x 2 m", 
"4 pieces of tissue, the largest measuring 4 x 2 x 1 m"), 
NA_character_, "4 pieces of tissue, the largest measuring 8 x 2 x 2m")

一种衬里碱R溶液

sapply(L, function(x) sum(as.numeric(substr(x, regexpr("\\d+(?= pieces of tissue)", x, perl=TRUE, useBytes=TRUE),
                                               regexpr("\\d+(?= pieces of tissue)", x, perl=TRUE, useBytes=TRUE)))))

输出

4 NA NA  4  2 12 NA 12 NA  4