我有一个csv文件,我想从中提取一些特定的列。我怎么能这样做?
我有一个标题词典和单元格位置,如:
dict = {'Col1' : [(4,5)], 'Col2' : [(4,7)], 'Col3' : [(4,9)]}
我想从dict的值开始提取数据,直到csv文件的末尾!
例如:
,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,
,,,Col0,Col1,,Col2,,Col3,Col4,
,,,bgr,abc,,efg,,hij,123,
,,,cde,klm,,nop,,qrs,123,
,,,asd,tuv,,wxy,,zzz,456,
,,,,,,,,,,
,,,,,,,,,,
我想提取
Col1,Col2,Col3
abc,efg,hij
klm,nop,qrs
tuv,wxy,zzz
并将其写入新的csv文件中!请帮我这样做!
我想有效地处理这种情况!
答案 0 :(得分:1)
如果你想从同一行读取每一列,下面的脚本将完成工作(注意只有2条python行是有用的):
import pandas as pd
# Give the name of the columns
colnames = ('skip1', 'skip2', 'skip3', 'Col0','Col1','skip4','Col2','skip5','Col3','Col4','skip6')
# Give the number of lines to skip
nbskip=4
# Give the number of rows to read (you can also filter rows after reading and remove the empty ones)
nrows=3
#List of columns to keep
keep_only = ('Col1','Col2','Col3')
#Read the csv
df = pd.io.parsers.read_csv('test.csv',
header=None,
skiprows=nbskip,
names=colnames,
nrows=nrows, # Remove if you prefer filter rows
usecols=keep_only)
# If the number of lines to keep is unknow,
# you can remove empty lines here
#Save the csv
df.to_csv('result.csv', index=False)