如何从下载的csv文件中提取特定数据并转换为新的csv文件?

时间:2014-06-14 18:09:30

标签: python csv

我正在使用在线调查应用程序,该应用程序允许我将调查结果下载到csv文件中。但是,下载的csv的格式将每个调查问题和答案放在一个新列中,而我需要将csv文件格式化为每个调查问题并在新行上回答。下载的csv文件中还有很多数据我想完全忽略。

如何解析下载的csv文件的所需行和列,并将它们以特定格式写入新的csv文件?

例如,我下载数据,它看起来像这样:

V1,V2,V3,Q1,Q2,Q3,Q4....
null,null,null,item,item,item,item....
0,0,0,4,5,4,5.... 
0,0,0,2,3,2,3....

第一行包含我需要的“键”,但必须排除V1-V3。必须完全排除第2行。第3行是我的第一个主题,因此我需要将值4,5,4,5与Q1,Q2,Q3,Q4键配对。第4行是一个新的主题,需要被排除,因为我的程序一次只处理一个主题。

为了让我的脚本正常运行,我需要创建的csv文件如下所示:

Q1,4
Q2,5
Q3,4
Q4,5

我尝试使用此izip来转移数据,但我不知道如何专门选择我需要的行和列:

from itertools import izip
a = izip(*csv.reader(open("CDI.csv", "rb")))
csv.writer(open("CDI_test.csv", "wb")).writerows(a)

3 个答案:

答案 0 :(得分:1)

这是一个简单的python脚本,应该为你完成这项工作。它从命令行接收参数,该参数指定要在行的开头跳过的条目数,要在行的末尾跳过的输入,输入文件和输出文件。例如,命令看起来像

python question.py 3:7 input.txt output.txt

如果您不想每次都声明参数,也可以在脚本中用sys.argv[1]代替3,sys.argv[2]代替“input.txt”等等。

文本文件版本:

import sys

inputFile = open(sys.argv[2],"r")
outputFile = open(sys.argv[3], "w")
leadingRemoved=int(sys.argv[1])

#strips extra whitespace from each line in file then splits by ","
lines = [x.strip().split(",") for x in inputFile.readlines()]
#zips all but the first x number of elements in the first and third row
zipped = zip(lines[0][leadingRemoved:],lines[2][leadingRemoved:])
for tuples in zipped:
    #writes the question/ number pair to a file. 
    outputFile.write(",".join(tuples))

inputFile.close()
outputFile.close()

#input from command line: python questions.py leadingRemoved pathToInput pathToOutput

CSV文件版本:

import sys
import csv


with open(sys.argv[2],"rb") as inputFile:
    #removes null bytes
    reader = csv.reader((line.replace('\0','') for line in inputFile),delimiter="\t")
    outputFile = open(sys.argv[3], "wb")
    leadingRemoved,endingremoved=[int(x) for x in sys.argv[1].split(":")]
    #creates a 2d array of all the elements for each row
    lines = [x for x in reader]
    print lines
    #zips all but the first x number of elements in the first and third row
    zipped = zip(lines[0][leadingRemoved:endingremoved],lines[2][leadingRemoved:endingremoved])
    writer = csv.writer(outputFile)
    writer.writerows(zipped)
    print zipped
    outputFile.close()  

答案 1 :(得分:0)

我使用多个值做了类似的事情但可以更改为单个值。

 #!/usr/bin/env python


import csv

def dict_from_csv(filename):
    '''
    (file)->list of dictionaries
    Function to read a csv file and format it to a list of dictionaries.
    The headers are the keys with all other data becoming values
    The format of the csv file and the headers included need to be know to extract the email addresses
    '''

    #open the file and read it using csv.reader()
    #read the file. for each row that has content add it to list mf
    #the keys for our user dict are the first content line of the file mf[0]
    #the values to our user dict are the other lines in the file mf[1:]
    mf = []
    with open(filename, 'r') as f:
        my_file = csv.reader(f)
        for row in my_file:
            if any(row):
                mf.append(row)
    file_keys = mf[0]
    file_values= mf[1:]  #choose row/rows you want

    #Combine the two lists, turning into a list of dictionaries, using the keys list as the key and the people list as the values
    my_list = []
    for value in file_values:
        my_list.append(dict(zip(file_keys, file_values)))

    #return the list of dictionaries
    return my_list

答案 2 :(得分:0)

我建议您阅读有关此类活动的大熊猫:

http://pandas.pydata.org/pandas-docs/stable/io.html

import pandas

input_dataframe = pandas.read_csv("input.csv")
transposed_df = input_dataframe.transpose()

# delete rows and edit data easily using pandas dataframe
# this is a good library to get some experience working with

transposed_df.to_csv("output.csv")