我想使用pandas从多列数据框中提取列的全部内容,但我只获得了列的一部分。
我使用的代码是:
import pandas
import csv
data = pandas.read_csv('data1.csv', usecols = ['dbSNP RS ID'])
import sys
sys.stdout = open("data2.csv", "w")
print data
我得到的是这样的:
dbSNP RS ID
0 rs4147951
1 rs2022235
2 rs6425720
3 rs12997193
4 rs9933410
5 rs7142489
... ...
934963 rs10262938
934964 rs6140985
934965 rs2704067
934966 rs2239441
934967 rs10041689
[934968 rows x 1 columns]
csv文件的前两行是:
"Probe Set ID","dbSNP RS ID","Chromosome","Physical Position","Strand","ChrX pseudo-autosomal region 1","Cytoband","Flank","Allele A","Allele B","Associated Gene","Genetic Map","Microsatellite","Fragment Enzyme Type Length Start Stop","Allele Frequencies","Heterozygous Allele Frequencies","Number of individuals","In Hapmap","Strand Versus dbSNP","Copy Number Variation","Probe Count","ChrX pseudo-autosomal region 2","In Final List","Minor Allele","Minor Allele Frequency","% GC","OMIM"
"AFFX- SNP_10000979","rs4147951","17","66943738","+","0","q24.2","GGATAAGGATGGGCTA[A/G]ATTATCATTGCTGTTA","A","G","ENST00000269080 // intron // 0 // Hs.58351 // ABCA8 // 10351 // ATP-binding cassette, sub-family A (ABC1), member 8 /// ENST00000428549 // intron // 0 // Hs.58351 // ABCA8 // 10351 // ATP-binding cassette, sub-family A (ABC1), member 8 /// ENST00000541225 // intron // 0 // Hs.58351 // ABCA8 // 10351 // ATP-binding cassette, sub-family A (ABC1), member 8 /// ENST00000542396 // intron // 0 // Hs.58351 // ABCA8 // 10351 // ATP-binding cassette, sub-family A (ABC1), member 8 /// NM_007168 // intron // 0 // Hs.58351 // ABCA8 // 10351 // ATP-binding cassette, sub-family A (ABC1), member 8","99.8510 // D17S795 // D17S2182 // --- // --- // deCODE /// 90.7912 // D17S1870 // D17S840 // AFM323TB1 // AFM207VF4 // Marshfield /// 82.3131 // --- // D17S1786 // 147671 // --- // SLM1","D17S795 // downstream // 265562 /// D17S1474E // upstream // 113179","NspI // ACATGT_ACATGT // 536 // 66943408 // 66943943 /// StyI // CCTTGG_CCATGG // 2334 // 66941614 // 66943947","0.3917 // 0.6083 // CEU /// 0.6444 // 0.3556 // CHB /// 0.6000 // 0.4000 // JPT /// 0.5667 // 0.4333 // YRI","0.3833 // CEU /// 0.4889 // CHB /// 0.4444 // JPT /// 0.5667 // YRI","60 // CEU /// 45 // CHB /// 45 // JPT /// 60 // YRI","YES","reverse","---","6","0","YES","A // CEU /// G // CHB /// G // JPT /// G // YRI","0.3917 // CEU /// 0.3556 // CHB /// 0.4000 // JPT /// 0.4333 // YRI","---","---"
关于如何提取dbSNP RS ID'从934968行?非常感谢你!
答案 0 :(得分:1)
IIUC您应该再次读取和写入.csv文件:
data = pandas.read_csv('data1.csv', usecols = ['dbSNP RS ID'])
data.to_csv('data2.csv')
您的代码存在的问题是print
函数实际上只在文件中写入了pandas在终端提示符中显示的文件部分。如果行数过多,则会在中间分割输出...
。