通过一个程序,我收到了以下模式计数。
计数器({'CCCC':22115,'TTTT':22043,'AAAA':22037,'GGGG':21930,'AAAC':154,'TTAT':152,'CCCA':152,'CCTC ':152,'GGGC':151,'TTTG':150,'GTGG':149,'GCCC':148,'CCGC':145,'CGGG':145,'TGGG':144,'AGAA': 144,'TTGT':144,'GAAA':142,'CCCG':142,'CCCT':142,'TCCC':141,'CAAA':139,'ATTT':137,'CGCC':134, 'GGTG':133,'GAGG':133,'TTTA':132,'CTTT':131,'TCTT':131,'ACCC':130,'AGGG':130,'GGAG':129,'AACA ':129,'TAAA':129,'TATT':128,'TTTC':128,'AAGA':127,'GGGA':126,'ACAA':126,'TTCT':125,'CTCC': 124,'GCGG':124,'ATAA':123,'GGCG':120,'CACC':119,'AAAT':118,'AATA':117,'AAAG':114,'GTTT':114, 'TGTT':112,'GGGT':112,'CCAC':110,'CGCG':45,'AACC':43,'TTAA':41,'CTCT':41,'GGCC':41,'ACTC ':40,'CTTC':40,'GCCG':39,'ATTA':39,'ACCT':39,'TGCG':39,'ATAT':39,'TCTC':38,'ACGG': 38,'TATA':37,'ATCA':37,'CGGC':37,'CGAG':36,'AGAG':36,'GACA':35,'GTTG':35,'TGAG':35, 'TGGT':35,'CCAA':35,'TTGG':34,'GTG T':34,'GCGC':34,'CACA':34,'GTAA':34,'GTAG':34,'TCCA':34,'TCCT':34,'AAGG':34,'GAGA' :34,'GCTT':34,'GTGC':33,'CTAT':33,'TTGC':33,'CGGA':33,'AGGA':32,'GACG':32,'AATT':32 ,'CAAC':32,'CTGC':32,'CTAC':32,'ACGA':32,'CGAC':32,'CCGG':32,'TCTG':32,'GGAA':32,' GGAT':32,'TGCT':32,'TTAG':32,'GCTG':32,'GAGT':31,'AGGC':31,'TTCC':31,'ATGA':31,'TTCA' :31,'CCAT':31,'AAGT':31,'GAGC':31,'GTAT':31,'CGAA':31,'TCAT':31,'ATTC':31,'TGTG':30 ,'AGTT':30,'ATCC':30,'AGCA':30,'GTCT':30,'TGTC':30,'TCAC':30,'CACT':30,'ACTA':30,' TAAT'':30,'CCGT':30,'CCTA':29,'TCGG':29,'GGTA':29,'TATG':29,'AACG':29,'CACG':29,'GATT' :29,'ATCT':29,'TGGC':29,'AGCC':29,'TATC':29,'GCTC':29,'GGCT':29,'TCTA':29,'AACT':28 ,'CCTT':28,'CTTA':28,'TGTA':28,'TAGT':28,'AGTG':28,'CCGA':27,'AATG':27,'CCTG':27,' CTGT”:27,“ AGTC”:27,“ GTCC”:27,“ GGTT”:27,“ ACAC”:26,“ TACC”:26,“ CATC”:26,“ CATA”:26,“ GTGA” : 26,'TGAA':26,'GGTC':26,'CTTG':26,'GCAC':26,'GGCA':26,'CGTC':26,'CTGG':26,'TAAG':26, 'TCGT':26,'TGAT':25,'CAGA':25,'GAAC':25,'ACCA':25,'TTAC':25,'CATT':25,'AGAT':25,'CGGT ':25,'ATTG':25,'TTGA':25,'GATA':24,'GGAC':24,'AAGC':24,'GTCA':24,'CAAT':24,'GCAG': 24,'ACAT':24,'TGCC':24,'ATAG':24,'CGTG':24,'CGCA':24,'TAGG':23,'ACCG':23,'TTCG':23, 'AGCG':23,'GTTC':23,'ACTT':23,'CGTT':23,'AGAC':23,'GCAT':22,'TCCG':22,'TAAC':22,'ACGC ':22,'CAGC':22,'GACC':22,'CATG':22,'TCGA':22,'TAGA':22,'GCAA':22,'CTCG':22,'TACT': 22,'AATC':21,'CGCT':21,'GAAT':21,'GCGT':21,'AGTA':21,'GCCA':21,'ATGG':21,'TCAA':21, 'CTCA':21,'TGGA':20,'GAAG':20,'GATC':20,'TGCA':20,'GCCT':19,'GTCG':19,'CAAG':19,'TCGC ':19,'CTGA':19,'GATG':19,'CTAA':19,'GCGA':19,'ATAC':18,'GTTA':18,'GCTA':18,'AGGT': 18,'CCAG':18,'ACAG':18,'CTAG':17,'CGTA':17,'ACGT':17,'TACA':17,'AGCT':16,'CAGG':16,'ATGT':16,'ATCG':16,'ATGC':15,'TGAC':14,'TAGC':14,'ACTG':14,'TCAG':14,'CGAT':14,'TACG ':13,'CAGT':11,'GTAC':10,'GACT':9})
我现在希望将其转换为列表,以便在第一列“ AAAA”中具有所有对应的值,因此对于所有组合也是如此。有谁知道如何进行良好的编程?
这就是我将数据读入R的方式:
日期<-read.table(“ / PATTERN.txt”,标头=假,sep =“ \ t”);
到目前为止,我已经尝试过直接阅读,但是以某种方式它并没有真正起作用。它应该看起来像这样:
AAAA CCCC
1 22128 22127
非常感谢您!
答案 0 :(得分:1)
如果注释中可重复显示的行包含数据,则在其中将Counter(
替换为[
,将)
替换为]
,将'
替换为"
并使用fromJSON进行阅读:
library(jsonlite)
fromJSON(gsub("'", '"',
sub("\\)", "]",
sub("Counter.","[", Lines))))
给予:
CCCC TTTT AAAA GGGG AAAC TTAT CCCA CCTC GGGC TTTG GTGG GCCC CCGC CGGG
1 22115 22043 22037 21930 154 152 152 152 151 150 149 148 145 145
TGGG AGAA TTGT GAAA CCCG CCCT TCCC CAAA ATTT CGCC GGTG GAGG TTTA CTTT TCTT
1 144 144 144 142 142 142 141 139 137 134 133 133 132 131 131
ACCC AGGG GGAG AACA TAAA TATT TTTC AAGA GGGA ACAA TTCT CTCC GCGG ATAA GGCG
1 130 130 129 129 129 128 128 127 126 126 125 124 124 123 120
CACC AAAT AATA AAAG GTTT TGTT GGGT CCAC CGCG AACC TTAA CTCT GGCC ACTC CTTC
1 119 118 117 114 114 112 112 110 45 43 41 41 41 40 40
GCCG ATTA ACCT TGCG ATAT TCTC ACGG TATA ATCA CGGC CGAG AGAG GACA GTTG TGAG
1 39 39 39 39 39 38 38 37 37 37 36 36 35 35 35
TGGT CCAA TTGG GTGT GCGC CACA GTAA GTAG TCCA TCCT AAGG GAGA GCTT GTGC CTAT
1 35 35 34 34 34 34 34 34 34 34 34 34 34 33 33
TTGC CGGA AGGA GACG AATT CAAC CTGC CTAC ACGA CGAC CCGG TCTG GGAA GGAT TGCT
1 33 33 32 32 32 32 32 32 32 32 32 32 32 32 32
TTAG GCTG GAGT AGGC TTCC ATGA TTCA CCAT AAGT GAGC GTAT CGAA TCAT ATTC TGTG
1 32 32 31 31 31 31 31 31 31 31 31 31 31 31 30
AGTT ATCC AGCA GTCT TGTC TCAC CACT ACTA TAAT CCGT CCTA TCGG GGTA TATG AACG
1 30 30 30 30 30 30 30 30 30 30 29 29 29 29 29
CACG GATT ATCT TGGC AGCC TATC GCTC GGCT TCTA AACT CCTT CTTA TGTA TAGT AGTG
1 29 29 29 29 29 29 29 29 29 28 28 28 28 28 28
CCGA AATG CCTG CTGT AGTC GTCC GGTT ACAC TACC CATC CATA GTGA TGAA GGTC CTTG
1 27 27 27 27 27 27 27 26 26 26 26 26 26 26 26
GCAC GGCA CGTC CTGG TAAG TCGT TGAT CAGA GAAC ACCA TTAC CATT AGAT CGGT ATTG
1 26 26 26 26 26 26 25 25 25 25 25 25 25 25 25
TTGA GATA GGAC AAGC GTCA CAAT GCAG ACAT TGCC ATAG CGTG CGCA TAGG ACCG TTCG
1 25 24 24 24 24 24 24 24 24 24 24 24 23 23 23
AGCG GTTC ACTT CGTT AGAC GCAT TCCG TAAC ACGC CAGC GACC CATG TCGA TAGA GCAA
1 23 23 23 23 23 22 22 22 22 22 22 22 22 22 22
CTCG TACT AATC CGCT GAAT GCGT AGTA GCCA ATGG TCAA CTCA TGGA GAAG GATC TGCA
1 22 22 21 21 21 21 21 21 21 21 21 20 20 20 20
GCCT GTCG CAAG TCGC CTGA GATG CTAA GCGA ATAC GTTA GCTA AGGT CCAG ACAG CTAG
1 19 19 19 19 19 19 19 19 18 18 18 18 18 18 17
CGTA ACGT TACA AGCT CAGG ATGT ATCG ATGC TGAC TAGC ACTG TCAG CGAT TACG CAGT
1 17 17 17 16 16 16 16 15 14 14 14 14 14 13 11
GTAC GACT
1 10 9
Lines <- "
Counter({'CCCC': 22115, 'TTTT': 22043, 'AAAA': 22037, 'GGGG':21930, 'AAAC': 154, 'TTAT': 152, 'CCCA': 152, 'CCTC': 152, 'GGGC': 151, 'TTTG': 150, 'GTGG': 149, 'GCCC': 148, 'CCGC': 145, 'CGGG': 145, 'TGGG': 144, 'AGAA': 144, 'TTGT': 144, 'GAAA': 142, 'CCCG': 142, 'CCCT': 142, 'TCCC': 141, 'CAAA': 139, 'ATTT': 137, 'CGCC': 134, 'GGTG': 133, 'GAGG': 133, 'TTTA': 132, 'CTTT': 131, 'TCTT': 131, 'ACCC': 130, 'AGGG': 130, 'GGAG': 129, 'AACA': 129, 'TAAA': 129, 'TATT': 128, 'TTTC': 128, 'AAGA': 127, 'GGGA': 126, 'ACAA': 126, 'TTCT': 125, 'CTCC': 124, 'GCGG': 124, 'ATAA': 123, 'GGCG': 120, 'CACC': 119, 'AAAT': 118, 'AATA': 117, 'AAAG': 114, 'GTTT': 114, 'TGTT': 112, 'GGGT': 112, 'CCAC': 110, 'CGCG': 45, 'AACC': 43, 'TTAA': 41, 'CTCT': 41, 'GGCC': 41, 'ACTC': 40, 'CTTC': 40, 'GCCG': 39, 'ATTA': 39, 'ACCT': 39, 'TGCG': 39, 'ATAT': 39, 'TCTC': 38, 'ACGG': 38, 'TATA': 37, 'ATCA': 37, 'CGGC': 37, 'CGAG': 36, 'AGAG': 36, 'GACA': 35, 'GTTG': 35, 'TGAG': 35, 'TGGT': 35, 'CCAA': 35, 'TTGG': 34, 'GTGT': 34, 'GCGC': 34, 'CACA': 34, 'GTAA': 34, 'GTAG': 34, 'TCCA': 34, 'TCCT': 34, 'AAGG': 34, 'GAGA': 34, 'GCTT': 34, 'GTGC': 33, 'CTAT': 33, 'TTGC': 33, 'CGGA': 33, 'AGGA': 32, 'GACG': 32, 'AATT': 32, 'CAAC': 32, 'CTGC': 32, 'CTAC': 32, 'ACGA': 32, 'CGAC': 32, 'CCGG': 32, 'TCTG': 32, 'GGAA': 32, 'GGAT': 32, 'TGCT': 32, 'TTAG': 32, 'GCTG': 32, 'GAGT': 31, 'AGGC': 31, 'TTCC': 31, 'ATGA': 31, 'TTCA': 31, 'CCAT': 31, 'AAGT': 31, 'GAGC': 31, 'GTAT': 31, 'CGAA': 31, 'TCAT': 31, 'ATTC': 31, 'TGTG': 30, 'AGTT': 30, 'ATCC': 30, 'AGCA': 30, 'GTCT': 30, 'TGTC': 30, 'TCAC': 30, 'CACT': 30, 'ACTA': 30, 'TAAT': 30, 'CCGT': 30, 'CCTA': 29, 'TCGG': 29, 'GGTA': 29, 'TATG': 29, 'AACG': 29, 'CACG': 29, 'GATT': 29, 'ATCT': 29, 'TGGC': 29, 'AGCC': 29, 'TATC': 29, 'GCTC': 29, 'GGCT': 29, 'TCTA': 29, 'AACT': 28, 'CCTT': 28, 'CTTA': 28, 'TGTA': 28, 'TAGT': 28, 'AGTG': 28, 'CCGA': 27, 'AATG': 27, 'CCTG': 27, 'CTGT': 27, 'AGTC': 27, 'GTCC': 27, 'GGTT': 27, 'ACAC': 26, 'TACC': 26, 'CATC': 26, 'CATA': 26, 'GTGA': 26, 'TGAA': 26, 'GGTC': 26, 'CTTG': 26, 'GCAC': 26, 'GGCA': 26, 'CGTC': 26, 'CTGG': 26, 'TAAG': 26, 'TCGT': 26, 'TGAT': 25, 'CAGA': 25, 'GAAC': 25, 'ACCA': 25, 'TTAC': 25, 'CATT': 25, 'AGAT': 25, 'CGGT': 25, 'ATTG': 25, 'TTGA': 25, 'GATA': 24, 'GGAC': 24, 'AAGC': 24, 'GTCA': 24, 'CAAT': 24, 'GCAG': 24, 'ACAT': 24, 'TGCC': 24, 'ATAG': 24, 'CGTG': 24, 'CGCA': 24, 'TAGG': 23, 'ACCG': 23, 'TTCG': 23, 'AGCG': 23, 'GTTC': 23, 'ACTT': 23, 'CGTT': 23, 'AGAC': 23, 'GCAT': 22, 'TCCG': 22, 'TAAC': 22, 'ACGC': 22, 'CAGC': 22, 'GACC': 22, 'CATG': 22, 'TCGA': 22, 'TAGA': 22, 'GCAA': 22, 'CTCG': 22, 'TACT': 22, 'AATC': 21, 'CGCT': 21, 'GAAT': 21, 'GCGT': 21, 'AGTA': 21, 'GCCA': 21, 'ATGG': 21, 'TCAA': 21, 'CTCA': 21, 'TGGA': 20, 'GAAG': 20, 'GATC': 20, 'TGCA': 20, 'GCCT': 19, 'GTCG': 19, 'CAAG': 19, 'TCGC': 19, 'CTGA': 19, 'GATG': 19, 'CTAA': 19, 'GCGA': 19, 'ATAC': 18, 'GTTA': 18, 'GCTA': 18, 'AGGT': 18, 'CCAG': 18, 'ACAG': 18, 'CTAG': 17, 'CGTA': 17, 'ACGT': 17, 'TACA': 17, 'AGCT': 16, 'CAGG': 16, 'ATGT': 16, 'ATCG': 16, 'ATGC': 15, 'TGAC': 14, 'TAGC': 14, 'ACTG': 14, 'TCAG': 14, 'CGAT': 14, 'TACG': 13, 'CAGT': 11, 'GTAC': 10, 'GACT': 9})"
答案 1 :(得分:0)
在特定情况下,此答案可能会对您有所帮助,但是您应该坚持认为,无论是谁产生的结果,都必须以每种编程语言都可以轻松导入的方式导出。这里有一个python对象的字符串表示形式,这绝对不是交换数据的好方法。
但是,您可以尝试以下操作:
#place here the correct path to the file
fn <- "pattern.txt"
#here we read the content of the file as is
filecontent <- readChar(fn,file.info(fn)$size)
#we manipulate the string a bit to have an R list
res <- eval(parse(text = gsub("[\\{\\}\n]", "",
gsub(":", "=", sub("Counter", "list", filecontent)))))