How to merge several csv columns into one column using python 2.7?

时间:2015-06-25 18:55:00

标签: python csv

I'm working with large set of csv data and I want to put several columns in different places into one column separated by semi-colon(;). So what I have now is.. a b c d 1 2 3 4 1 2 3 4 1 2 3 4 I want to change this like..this, So all my data is only in column d. a b c d a=1;b=2;c=3;d=4; a=1;b=2;c=3;d=4; a=1;b=2;c=3;d=4; I know how to delete those empty column a,b and c but I just can't figure out a way to merge the data from column a,b,c into column d. Thanks in advance. The code that I have so far is.. # Parsing the custom formatted data with csv module. # reads the custom format input and spits out the output in VCF format. import csv # input and output with open('1-0002', 'rb') as csvin, open('converted1','wb') as csvout: # reading and writing are all tab delimited reader = csv.reader(csvin, delimiter = '\t') writer = csv.writer(csvout, delimiter = '\t') # add headings before the for loop to prevent the heading being affected by column manipulation. writer.writerow(["#CHROM","POS","ID","REF","ALT","QUAL","FILTER","INFO"]) for row in reader: # deleting unnecessary columns, 'del' operator must be in ascending order or else it will give range error # manually deleting columns since the input data is in custom format. del row[11] del row[10] del row[9] del row[8] del row[7] del row[6] del row[5] del row[1] del row[0] # inserting 1 and . in specific columns row.insert(0,'1') row.insert(2,'.') row.insert(5,'.') row.insert(7,'') # inserting empty column for INFO headings. # change 'YES' to 'PASS' , leaving HETERO as it is. if row[6] == 'YES': row[6] = 'PASS' writer.writerow(row) So from this code above, I want to put the data from several different columns into INFO column.

3 个答案:

答案 0 :(得分:3)

简单回答:不要打扰删除行,但要插入新行,只选择你想要的内容。

看起来像这样:

# leave row alone, don't bother deleting columns in it.
new_row = ["a=%s;b=%s;c=%s;d=%s"% (row[12], row[13], row[14])] 
# new_row has only one column, with a string constructed of what you need. 

writer.writerow(new_row)

瞧,那应该为你做。您还可以将所需的任何其他列复制到new_row,并append()复制您可能需要的任何其他列。

答案 1 :(得分:0)

just read the file with a reader and write the data you read into another file. this example is assuming you want to still keep empty columns a, b and c r = csv.readedr('filename.csv', 'r') with open('outfile.csv') as outfile: w = csv.writer(outfile, 'w') for row in r: # joins the current row into string seperated by spaces outdata = row.join(' ') # leaves first three columns empty, writes old content onto fourth column data_to_write = ('', '', '', outdata) w.writerow(data_to_write)

答案 2 :(得分:0)

尝试pandas

import pandas as pd
df = pd.read_csv('1-0002.csv')
df['d_merged'] = df.apply(lambda row: 'a={0};b={1};c={2};d={3};'.format(row['a'],row['b'],row['c'],row['d']), axis=1)

这给出了:

>>> df

   a  b  c  d          d_merged
0  1  2  3  4  a=1;b=2;c=3;d=4;
1  1  2  3  4  a=1;b=2;c=3;d=4;
2  1  2  3  4  a=1;b=2;c=3;d=4;

现在删除您不想要的列:

df = df.drop(['a','b','c','d'], axis=1)

>>> df

           d_merged
0  a=1;b=2;c=3;d=4;
1  a=1;b=2;c=3;d=4;
2  a=1;b=2;c=3;d=4;

如果您愿意,现在重命名d_merged

df = df.rename(columns={'d_merged':'d'})

>>> df

                  d
0  a=1;b=2;c=3;d=4;
1  a=1;b=2;c=3;d=4;
2  a=1;b=2;c=3;d=4;

(或者,将上述两个步骤合并为:

df['d'] = df.apply(lambda row: 'a={0};b={1};c={2};d={3};'.format(row['a'],row['b'],row['c'],row['d']), axis=1)

然后写入CSV:

df.to_csv('csvout.csv', index=False)