Question

我有一个csv文件，我想从中提取一些特定的列。我怎么能这样做？
我有一个标题词典和单元格位置，如：

dict = {'Col1' : [(4,5)], 'Col2' : [(4,7)], 'Col3' : [(4,9)]}

我想从dict的值开始提取数据，直到csv文件的末尾！

例如：

,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,
,,,Col0,Col1,,Col2,,Col3,Col4,
,,,bgr,abc,,efg,,hij,123,
,,,cde,klm,,nop,,qrs,123,
,,,asd,tuv,,wxy,,zzz,456,
,,,,,,,,,,
,,,,,,,,,,

我想提取

Col1,Col2,Col3
abc,efg,hij
klm,nop,qrs
tuv,wxy,zzz

并将其写入新的csv文件中！请帮我这样做！
我想有效地处理这种情况！

Answer 1

Pandas是一个具有强大method读取csv文件的库。

如果你想从同一行读取每一列，下面的脚本将完成工作（注意只有2条python行是有用的）：

import pandas as pd


# Give the name of the columns
colnames = ('skip1', 'skip2', 'skip3', 'Col0','Col1','skip4','Col2','skip5','Col3','Col4','skip6')
# Give the number of lines to skip
nbskip=4
# Give the number of rows to read (you can also filter rows after reading and remove the empty ones)
nrows=3
#List of columns to keep
keep_only = ('Col1','Col2','Col3')

#Read the csv
df =  pd.io.parsers.read_csv('test.csv', 
                 header=None,
                 skiprows=nbskip,
                 names=colnames,
                 nrows=nrows, # Remove if you prefer filter rows
                 usecols=keep_only)

# If the number of lines to keep is unknow,
# you can remove empty lines here

#Save the csv
df.to_csv('result.csv', index=False)

如何从csv文件中提取特定列，并在python中为它编写新的csv

1 个答案: