我有一个列表(这是数据框中的一列),其中包含如下字符串:
@pytest.override_fixture('resource', 'broken_resource')
def test_service_when_resource_broken(service):
# service should have been instantiated with broken_resource instead of resource.
assert service.status == "bad"
此列表是从
行生成的 list("4 pieces of tissue, the largest measuring 4 x 3 x 2 m",
NA_character_, NA_character_, "4 pieces of tissue, the largest measuring 4 x 2 x 2m",
"2 pieces of tissue, the larger measuring 4 x 2 x 2 m", c("4 pieces of tissue, the largest measuring 5 x 4 x 2 m",
"4 pieces of tissue, the largest measuring 6 x 2 x 1 m",
"4 pieces of tissue, the largest measuring 4 x 3 x 1 m"),
NA_character_, c("4 pieces of tissue, the largest measuring 4 x 3 x 2 m",
"4 pieces of tissue, the largest measuring 5 x 2 x 2 m",
"4 pieces of tissue, the largest measuring 4 x 2 x 1 m"),
NA_character_, "4 pieces of tissue, the largest measuring 8 x 2 x 2m")
作为以下功能的一部分
我想提取列表中每个元素的组织块数之和。我一直在努力:
x$NumbOfBx <- str_extract_all(x[,y], "([A-Za-z]*|[0-9]) (specimens|pieces).*?(([0-9]).*?x.*?([0-9]).*?x.*?([0-9])).*?([a-z])")
但我收到了错误
function(x,y) {
x<-data.frame(x)
x$NumbOfBx <- str_extract_all(x[,y], "([A-Za-z]*|[0-9]) (specimens|pieces).*?(([0-9]).*?x.*?([0-9]).*?x.*?([0-9])).*?([a-z])")
x$NumbOfBx <- sapply(x$NumbOfBx, function(x) sum(as.numeric(unlist(str_extract_all(x$NumbOfBx, "^\\d+")))))
x$NumbOfBxs <- unlist(x$NumbOfBx)
x$NumbOfBx <- as.numeric(str_extract(x$NumbOfBx, "^.*?\\d"))
return(x)
}
答案 0 :(得分:1)
这样的东西?简而言之,假设您的数据是一个列表,您可以提取单词sample | samples之前的数值,将其转换为数字,然后汇总列表中包含的每个向量中的计数。这是你提出的相同策略,只需进行一些修改......
# Assuming your list is defined as my.list
xtr.pieces <- function(ml) {
my.sums <- lapply(ml, (function(el){
sum (sapply(el, (function(tmp){
if (!is.na(tmp)) {
loc <- regexpr("[0-9]{1,2}.{0,3}[sample|specimen]", tmp)
if (loc > 0) {
tmp <- substr(tmp, loc, loc + attributes(loc)$match.length)
as.numeric(gsub("[^[:digit:]]", "", tmp))
}
} else {
0
}
})))
}))
return (my.sums)
}
这里的NA被算作0。你可以执行,然后得到:
unlist(xtr.pieces(ml))
[1] 4 0 0 4 2 12 0 12 0 4
答案 1 :(得分:1)
L <- list("4 pieces of tissue, the largest measuring 4 x 3 x 2 m",
NA_character_, NA_character_, "4 pieces of tissue, the largest measuring 4 x 2 x 2m",
"2 pieces of tissue, the larger measuring 4 x 2 x 2 m", c("4 pieces of tissue, the largest measuring 5 x 4 x 2 m",
"4 pieces of tissue, the largest measuring 6 x 2 x 1 m",
"4 pieces of tissue, the largest measuring 4 x 3 x 1 m"),
NA_character_, c("4 pieces of tissue, the largest measuring 4 x 3 x 2 m",
"4 pieces of tissue, the largest measuring 5 x 2 x 2 m",
"4 pieces of tissue, the largest measuring 4 x 2 x 1 m"),
NA_character_, "4 pieces of tissue, the largest measuring 8 x 2 x 2m")
sapply(L, function(x) sum(as.numeric(substr(x, regexpr("\\d+(?= pieces of tissue)", x, perl=TRUE, useBytes=TRUE),
regexpr("\\d+(?= pieces of tissue)", x, perl=TRUE, useBytes=TRUE)))))
4 NA NA 4 2 12 NA 12 NA 4