t2=url("ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE1nnn/GSE1000/matrix/", open = "", blocking = TRUE, encoding = getOption("encoding"))
t2
t2=t2[-2]
isOpen(t2)
t2= readLines(t2, n = 4200)
t2[4010]
summary(t2)
使用上面的代码我可以获取ftp文件,但我无法进行任何进一步的绘图? 我能够看到数据。
但是,我无法安排在桌子上。 任何人都可以帮忙
答案 0 :(得分:1)
以下代码将毫无问题地读取数据:
dta <- read.csv("ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid225/U00096.ptt",
header = TRUE, skip = 2, sep = "\t")
我猜你是在追踪数据框:
> head(dta)
Location Strand Length PID Gene Synonym Code COG Product
1 190..255 + 21 1786182 thrL b0001 - - thr operon leader peptide
2 337..2799 + 820 1786183 thrA b0002 - - Bifunctional aspartokinase/homoserine dehydrogenase 1
3 2801..3733 + 310 1786184 thrB b0003 - - homoserine kinase
4 3734..5020 + 428 1786185 thrC b0004 - - L-threonine synthase
5 5234..5530 + 98 1786186 yaaX b0005 - - DUF2502 family putative periplasmic protein
6 5683..6459 - 258 1786187 yaaA b0006 - - peroxide resistance protein, lowers intracellular iron
为了简化导入,我跳过前两行:
Escherichia coli str. K-12 substr. MG1655, complete genome. - 1..4641652
4140 proteins
Location Strand Length PID Gene Synonym Code COG Product
190..255 + 21 1786182 thrL b0001 - - thr operon leader peptide
如果您想阅读整个文件,我建议您查看this post。您可以考虑阅读整个内容并分别访问前两行,然后将其余内容导入数据框。
答案 1 :(得分:0)
测试我的评论:
read.delim( text=c("4350031..4351662\t-\t543\t1790567\tdcuS\tb4125\t-\t-\tsensory histidine kinase in two-component regulatory system with DcuR, regulator of anaerobic fumarate respiration" ,
"4351843..4352073\t+\t76\t1790568\tyjdI\tb4126\t-\t-\tputative 4Fe-4S mono-cluster protein" ), header=FALSE)
#---------
V1 V2 V3 V4 V5 V6 V7 V8
1 4350031..4351662 - 543 1790567 dcuS b4125 - -
2 4351843..4352073 + 76 1790568 yjdI b4126 - -
V9
1 sensory histidine kinase in two-component regulatory system with DcuR, regulator of anaerobic fumarate respiration
2 putative 4Fe-4S mono-cluster protein
我怀疑第一行实际上是一个标题,因为它似乎是我在该FTP站点中查看的README文件中的模式,因此您可能会删除header=FALSE
。这些只是[3883-3884]。