我目前在一系列for循环中进行主题搜索,并希望移至嵌套的小标题以提高速度和简便性(ish)。但是,我无法弄清楚如何在小标题中存储小标题,因此可以将其嵌套。如果无法实现,那么将感谢您提供有关如何传递列表(以及ID列)的技巧,以便以后可以将其加入到原始表中。
输入:一组坐标和相应的DNA序列
目标:
1)找到我关心的主题实例
2)将它们与范围的起点或终点结合以创建所有起点和终点对(其中找到的位置可以是其中任意一个)
3)确定配对的类型
我无法弄清楚如何使突变接受小插曲(mutate_impl(.data,点)中的错误:“ pairs”列是不受支持的类data.frame)。我在这里不能按行调用,因为我需要将整个位置列表以及其他列中的值发送给函数。
test_input = tibble(
start = c(1,10,15),
end = c(9, 14, 25),
sequence = c("GAGAGAGTC","CATTT", "TCACAGTTTCC")
)
custom_function = function(start, end, list.of.positions) {
## Doesn't include extra math, case specifications, and error handling here for simplicity
starts = c(start, list.of.positions)
ends = c(end, list.of.positions)
pairs = expand.grid(starts, ends) %>% as_tibble %>%
mutate(type = case_when(TRUE ~ "a_type")) #Simplified for example to one case
return(pairs)
}
test_input %>%
# for each set of coordinates/string
rowwise() %>%
# find the positions of a given motif
mutate(match.positions = regexp.match.ends(gregexpr("AG", sequence))) %>%
mutate(num.matches = case_when(
is_logical(match.positions) ~ NA_integer_,
TRUE ~ length(match.positions)
)) %>%
# expand and covert to real positions
unnest %>% rowwise %>%
mutate(true.positions = case_when(
is.na(match.positions) ~ NA_real_, #must be a double-compatible NA
TRUE ~ start + match.positions - 1)) %>%
select(-match.positions) %>%
ungroup() %>%
# re-"nest" into a list of real positions
group_by_at(vars(-true.positions)) %>%
summarise(true.positions = list(true.positions)) %>%
# pass list of real positions to a function that creates pairs of coordinates and determines the type of pair
mutate(pairs = custom_function(start, end, true.positions))
我最后的小节应该是这样的(在取消配对之后):
start end sequence new.start new.end type
<dbl> <dbl> <chr> <dbl> <dbl> <chr>
1 1 9 GAGAGAGTC 1 3 a_type
1 1 9 GAGAGAGTC 1 5 a_type
2 1 9 GAGAGAGTC 1 7 a_type
3 1 9 GAGAGAGTC 1 9 a_type
4 1 9 GAGAGAGTC 3 5 a_type
...
10 1 9 GAGAGAGTC 7 9 a_type
11 10 14 CATTT 10 14 a_type
...
我想到的一种解决方法是将输出值粘贴到字符串中,然后将其作为列表传递回去,该选项可以容忍,取消嵌套,然后将其分隔开,但是肯定有一种不太麻烦的方法可以解决此问题。非常感谢您的帮助/想法!
答案 0 :(得分:0)
因此,我对主题完全不熟悉。但是我想我可以拼凑出您要做什么。我喜欢使用stringr软件包,因为它使用简单的语法完成了很多工作。
test_input <- tibble(
start = c(1,10,15),
end = c(9, 14, 25),
sequence = c("GAGAGAGTC","CATTT", "TCACAGTTTCC")
)
custom_function <- function(string, pattern, label) {
string %>%
str_locate_all(pattern) %>% # get the start-end pairs.
as.data.frame() %>% # make it a data.frame
expand.grid() %>% # all combos. this seemed important.
mutate(
sequence = string,
type = label
) %>% # add the string and label to each row.
%>% rename(
new_start = start, # rename so we don't confuse columns.
new_end = end # I prefer not to use dots in my names.
) %>%
left_join(test_input) %>% # add the original start and ends
return() # return df has cols: start, end, sequence, new_start, new_end, type.
}
final_out <- data.frame(
start = numeric(0),
end = numeric(0),
sequence = character(0),
new_start = numeric(0),
new_end = numeric(0)
) # empty dummy DF that we'll add to.
for (string in test_input$sequence) {
final_out <- custom_function(string = string,
pattern = 'AG',
label = 'a_type') %>%
bind_rows(final_out)
} # add the rows of each output to the final DF we made.
print(final_out)
您似乎试图根据提供的模式来标记结果,因此可以指定'a_type'或所需的任何标签。
也许可以通过使用map
或apply
函数在没有for循环的情况下做到这一点。我必须四处修补才能弄清楚这一点。
希望能帮助您,或者至少将您引向正确的方向。就像我说的那样,我对主题不熟悉。