我想在xls中提取给定样本的所有信息 为了例如
library(GEOquery)
gpl <- getGEO("GPL16791")
data <- gpl@header$sample_id
gps <- getGEO(data[1])
str(gps)
如下所示
Formal class 'GSM' [package "GEOquery"] with 2 slots
..@ dataTable:Formal class 'GEODataTable' [package "GEOquery"] with 2 slots
.. .. ..@ columns:'data.frame': 0 obs. of 0 variables
.. .. ..@ table :'data.frame': 0 obs. of 0 variables
..@ header :List of 36
.. ..$ channel_count : chr "1"
.. ..$ characteristics_ch1 : chr "cell type: Induced endothelial cells from cultured foreskin fibroblast cells (Stegment)"
.. ..$ contact_address : chr "3333 Burnet Ave"
.. ..$ contact_city : chr "Cincinnati"
.. ..$ contact_country : chr "USA"
.. ..$ contact_department : chr "Biomedical Informatics"
.. ..$ contact_email : chr "Rebekah.Karns@cchmc.org"
.. ..$ contact_institute : chr "Cincinnati Children's Hospital Medical Center"
.. ..$ contact_laboratory : chr "Bruce Aronow, PhD"
.. ..$ contact_name : chr "Rebekah,,Karns"
.. ..$ contact_state : chr "OH"
.. ..$ contact_zip/postal_code: chr "45276"
.. ..$ data_processing : chr [1:4] "Trimmed sequences were generated as fastq outputs and analyzed based on the TopHat/Cufflinks pipeline based on reference annota"| __truncated__ "Gene-level expression was normalized and baselined to the 80th percentile of that sample's overall expression in GeneSpring v7."| __truncated__ "Genome_build: GRCh37/hg19" "Supplementary_files_format_and_content: Each sample has a corresponding .txt file with normalized FPKM"
.. ..$ data_row_count : chr "0"
.. ..$ description : chr "iECa"
.. ..$ extract_protocol_ch1 : chr [1:2] "Using RNeasy Mini Kit (Qiagen), total RNA was extracted and quantitative polymerase chain reaction was performed using Taqman g"| __truncated__ "RNA-Seq–based expression analysis was carried out using RNA samples converted into individual cDNA libraries using Illumina (Sa"| __truncated__
.. ..$ geo_accession : chr "GSM1098572"
.. ..$ growth_protocol_ch1 : chr "Fibroblasts were treated with Poly I:C (30ng/ml) and the medium changed to DMEM with 7.5% FBS and 7.5% knockout serum replaceme"| __truncated__
.. ..$ instrument_model : chr "Illumina HiSeq 2500"
.. ..$ last_update_date : chr "Apr 18 2013"
.. ..$ library_selection : chr "cDNA"
.. ..$ library_source : chr "transcriptomic"
.. ..$ library_strategy : chr "RNA-Seq"
.. ..$ molecule_ch1 : chr "total RNA"
.. ..$ organism_ch1 : chr "Homo sapiens"
.. ..$ platform_id : chr "GPL16791"
.. ..$ relation : chr [1:2] "SRA: http://www.ncbi.nlm.nih.gov/sra?term=SRX249507" "BioSample: http://www.ncbi.nlm.nih.gov/biosample/SAMN01978505"
.. ..$ series_id : chr "GSE45176"
.. ..$ source_name_ch1 : chr "Induced endothelial cell"
.. ..$ status : chr "Public on Apr 14 2013"
.. ..$ submission_date : chr "Mar 14 2013"
.. ..$ supplementary_file_1 : chr "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM1098nnn/GSM1098572/suppl/GSM1098572_iECa_Processed.txt.gz"
.. ..$ supplementary_file_2 : chr "ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX249/SRX249507"
.. ..$ taxid_ch1 : chr "9606"
.. ..$ title : chr "iEC: Rep1"
.. ..$ type : chr "SRA"
我希望输出为txt或xls,每行是“data”中的一个样本,并且包含列中的所有这些信息,例如
channel_count characteristics_ch1 contact_address .....
1 "1" "cell type: Induced endothelial cells "3333 Burnet Ave"
2
.
.
.
until length of data
答案 0 :(得分:0)
当标题缺少变量时,此函数现在也可以使用。我知道循环不是很优雅,但它在我的测试中起作用。
gpl <- getGEO("GPL18448")
data <- gpl@header$sample_id
getGpsInfo <- function(x){
gps <- getGEO(x)
gps <- unlist(gps@header)
gps <- data.frame(gps, stringsAsFactors = F)
gps <- t(gps)
# if gps has multiple rows keep only unique ones
gps <- unique(gps)
return(gps)
}
dat <- lapply(data, FUN = getGpsInfo)
# dat is a list with different numbers of elements per entry
varnames <- unique(unlist(lapply(dat, colnames)))
dat2 <- data.frame(matrix(NA, nrow = length(dat), ncol = length(varnames)))
colnames(dat2) <- varnames
for(i in seq(along=dat)){
for(j in seq_along(varnames)){
element <- which(colnames(dat[[i]]) == varnames[j])
replacement <- dat[[i]][element]
if (length(replacement) > 0){
dat2[i,j] <- replacement
}
}
}
write.table(dat2, file = "dat2.csv", row.names = T, sep = ";")