即使这听起来像是一个重复的问题,我也没有找到解决方案。好吧,我有一个大的.csv文件,看起来像:
btn.Size.Width / 2, btn.Size.Height / 2, (int)MainGame.GameFontLarge.MeasureString("Play").X / 2, (int)MainGame.GameFontLarge.MeasureString("Play").Y / 2
目的是根据最后两列('ident'和'country')将此.csv文件分割成多个较小的.csv文件。
我使用了之前post中答案的代码,如下所示:
prot_hit_num,prot_acc,prot_desc,pep_res_before,pep_seq,pep_res_after,ident,country
1,gi|21909,21 kDa seed protein [Theobroma cacao],A,ANSPV,L,F40,EB
1,gi|21909,21 kDa seed protein [Theobroma cacao],A,ANSPVL,D,F40,EB
1,gi|21909,21 kDa seed protein [Theobroma cacao],L,SSISGAGGGGLA,L,F40,EB
1,gi|21909,21 kDa seed protein [Theobroma cacao],D,NYDNSAGKW,W,F40,EB
....
但是,我需要输出.csv只包含'pep_seq'列,所需的输出如下:
csv_contents = []
with open(outfile_path4, 'rb') as fin:
dict_reader = csv.DictReader(fin) # default delimiter is comma
fieldnames = dict_reader.fieldnames # save for writing
for line in dict_reader: # read in all of your data
csv_contents.append(line) # gather data into a list (of dicts)
# input to itertools.groupby must be sorted by the grouping value
sorted_csv_contents = sorted(csv_contents, key=op.itemgetter('prot_desc','ident','country'))
for groupkey, groupdata in it.groupby(sorted_csv_contents,
key=op.itemgetter('prot_desc','ident','country')):
with open(outfile_path5+'slice_{:s}.csv'.format(groupkey), 'wb') as fou:
dict_writer = csv.DictWriter(fou, fieldnames=fieldnames)
dict_writer.writerows(groupdata)
我该怎么办?
答案 0 :(得分:2)
您的代码几乎是正确的,只需要正确设置fieldsnames
并设置extraaction='ignore'
即可。这告诉DictWriter
只写你指定的字段:
import itertools
import operator
import csv
outfile_path4 = 'input.csv'
outfile_path5 = r'my_output_folder\output.csv'
csv_contents = []
with open(outfile_path4, 'rb') as fin:
dict_reader = csv.DictReader(fin) # default delimiter is comma
fieldnames = dict_reader.fieldnames # save for writing
for line in dict_reader: # read in all of your data
csv_contents.append(line) # gather data into a list (of dicts)
group = ['prot_desc','ident','country']
# input to itertools.groupby must be sorted by the grouping value
sorted_csv_contents = sorted(csv_contents, key=operator.itemgetter(*group))
for groupkey, groupdata in itertools.groupby(sorted_csv_contents, key=operator.itemgetter(*group)):
with open(outfile_path5+'slice_{:s}.csv'.format(groupkey), 'wb') as fou:
dict_writer = csv.DictWriter(fou, fieldnames=['pep_seq'], extrasaction='ignore')
dict_writer.writeheader()
dict_writer.writerows(groupdata)
这将为您提供包含以下内容的输出csv文件:
pep_seq
ANSPV
ANSPVL
SSISGAGGGGLA
NYDNSAGKW
答案 1 :(得分:1)
以下内容将为每个国家/地区输出一个仅包含您需要的字段的csv文件。
您可以随时在我想要的第二个字段中添加另一个步骤进行分组。
import csv
# use a dict so you can store the list of pep_seqs found for each country
# the country value with be the dict key
csv_rows_by_country = {}
with open('in.csv', 'rb') as csv_in:
csv_reader = csv.reader(csv_in)
for row in csv_reader:
if row[7] in csv_rows_by_country:
# add this pep_seq to the list we already found for this country
csv_rows_by_country[row[7]].append(row[4])
else:
# start a new list for this country - we haven't seen it before
csv_rows_by_country[row[7]] = [row[4],]
for country in csv_rows_by_country:
# create a csv output file for each country and write the pep_seqs into it.
with open('out_%s.csv' % (country, ), 'wb') as csv_out:
csv_writer = csv.writer(csv_out)
for pep_seq in csv_rows_by_country[country]:
csv_writer.writerow([pep_seq, ])