我有三个包含一组字符串的文件。 File1和File2包含File3的子字符串。我想从位于File1和File2中的子串之间的File3中减去字符串。请参阅下面的示例:
File1(substring 1):
head(fivep$V2)
[1] UGAGGUAGUAGUUUGUACAGUU UGAGGUAGUAGUUUGUGCUGUU ACAUACUUCUUUAUAUGCCCAUA UAGCAGCACAUCAUGGUUUACA
[5] GGGUUCCUGGCAUGCUGAUUU AGAGCUUAGCUGAUUGGUGAAC
File2(substring 2)
head(threep$V2)
[1] ACUGUACAGGCCACUGCCUUGC CUGCGCAAGCUACUGCCUUGCU UGGAAUGUAAAGAAGUAUGUAU CGAAUCAUUAUUUGCUGCUCUA
[5] AUCACAUUGCCAGGGAUUACC UUCACAGUGGCUAAGUUCUGC
文件3
head(hairpin$V2)
[1] UACACUGUGGAUCCGGUGAGGUAGUAGGUUGUAUAGUUUGGAAUAUUACCACCGGUGAACUAUGCAAUUUUCUACCUUACCGGAGACAGAACUCUUCGA
[2] AUGCUUCCGGCCUGUUCCCUGAGACCUCAAGUGUGAGUGUACUAUUGAUGCUUCACACCUGGGCUCUCCGGGUACCAGGACGGUUUGAGCAGAU
[3] AAAGUGACCGUACCGAGCUGCAUACUUCCUUACAUGCCCAUACUAUAUCAUAAAUGGAUAUGGAAUGUAAAGAAGUAUGUAGAACGGGGUGGUAGU
[4] UAAACAGUAUACAGAAAGCCAUCAAAGCGGUGGUUGAUGUGUUGCAAAUUAUGACUUUCAUAUCACAGCCAGCUUUGAUGUGCUGCCUGUUGCACUGU
[5] CGGACAAUGCUCGAGAGGCAGUGUGGUUAGCUGGUUGCAUAUUUCCUUGACAACGGCUACCUUCACUGCCACCCCGAACAUGUCGUCCAUCUUUGAA
[6] UCUCGGAUCAGAUCGAGCCAUUGCUGGUUUCUUCCACAGUGGUACUUUCCAUUAGAACUAUCACCGGGUGGAAACUAGCAGUGGCUCGAUCUUUUCC
示例:
String in File1 String in File2
AGGGCUUAGCUGCUUGUGAGCA UUCACAGUGGCUAAGUUCCGC
String in File3 CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG
此示例的输出:
GGGUCCACACCAAGUCGUG
答案 0 :(得分:4)
在Perl中,您可以尝试以下代码:
use strict;
use warnings;
my $file1 = "AGGGCUUAGCUGCUUGUGAGCA";
my $file2 = "UUCACAGUGGCUAAGUUCCGC";
my $file3 = "CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG";
my ($result) = $file3 =~ /$file1(.*?)$file2/;
print $result;
输出:
GGGUCCACACCAAGUCGUG
答案 1 :(得分:2)
这是R中的解决方案:
file1 <- "AGGGCUUAGCUGCUUGUGAGCA"
file2 <- "UUCACAGUGGCUAAGUUCCGC"
file3 <- "CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG"
# create a regular expression
pattern <- paste0(".*", file1, "(.*)", file2, ".*")
# extract the substring
sub(pattern, "\\1", file3)
# [1] "GGGUCCACACCAAGUCGUG"
答案 2 :(得分:1)
在python
>>> a='AGGGCUUAGCUGCUUGUGAGCA'
>>> b='UUCACAGUGGCUAAGUUCCGC'
>>> c='CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG'
>>> regex = a + '(.*?)' + b
>>> regex
'AGGGCUUAGCUGCUUGUGAGCA(.*?)UUCACAGUGGCUAAGUUCCGC'
>>> re.findall(regex,c)
['GGGUCCACACCAAGUCGUG']
答案 3 :(得分:1)
在gsubfn中使用strapplyc
尝试此操作。我们假设只有s1
和s2
的一个实例,或者如果有多个实例需要s1
的第一个实例和{{1的最后一个实例之间的字符串}}。如果可能有多个实例并且您想要不同的内容,请将此问题添加到问题中。
s2
答案 4 :(得分:1)
在python中`
string1 = "AGGGCUUAGCUGCUUGUGAGCA" string2 = "UUCACAGUGGCUAAGUUCCGC" string_main = "CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG" print string_main[string_main.find(string1)+len(string1):string_main.find(string2)]
答案 5 :(得分:1)
根据您给定的输入,以下内容可行。
f1 <- "AGGGCUUAGCUGCUUGUGAGCA"
f2 <- "UUCACAGUGGCUAAGUUCCGC"
f3 <- "CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG"
strsplit(f3, paste(f1, f2, sep='|'))[[1]][2]
# [1] "GGGUCCACACCAAGUCGUG"
答案 6 :(得分:1)
在R中使用qdapRegex
f1 <- "AGGGCUUAGCUGCUUGUGAGCA"
f2 <- "UUCACAGUGGCUAAGUUCCGC"
f3 <- "CUGAGGAGCAGGGCUUAGCUGCUUGUGAGCAGGGUCCACACCAAGUCGUGUUCACAGUGGCUAAGUUCCGCCCCCCAG"
library(qdapRegex)
rm_between(f3, f1, f2, extract=TRUE)
## [[1]]
## [1] "GGGUCCACACCAAGUCGUG"
顾名思义rm_between
删除或抓取左右边界之间的项目。使用extract = TRUE
抓取边界之间的字符串。返回的值是一个列表,因为每个字符串可能有多个提取。如果这是不受欢迎的,请使用unlist
中的unlist(rm_between(f3, f1, f2, extract=TRUE))
。