选择性CSV导入

时间:2015-06-11 17:49:33

标签: python postgresql csv

我从河流温度监测设备导出CSV文件;我需要通过使用Jython的应用程序将其导入PostGres数据库。 我知道如何导入整个文件,问题是csv顶部有一堆设备信息,我不需要其中一个列。 所以我需要从第20行开始,删除B列并删除最后一行,这是数据已经结束的语句。 如果有人知道这个Python库有任何帮助,我们将不胜感激!

2 个答案:

答案 0 :(得分:0)

如果我理解正确你试图将带有额外行和列的文件转换为另一个没有额外数据的文件,那么你可以导入它。

import csv

我不知道如何使用默认包含的csv库,但你也可以试试这个:

with open(inputFile) as if:            #will close f after ending
    with open(outputFile, mode='w') as of:
        lines = if.readLines()[19:-1]   #Ignore first 19 lines and last 1
        for line in lines:
            line = line.split('\t')    #Use the char/sequence that uses the CSV
            del line[1]                #I guess colB is at line[1]
            print(*line, sep='\t', file=of)

答案 1 :(得分:0)

编辑:这是在Python 2.7中编写并执行的pandas 0.16.1

好的,让我们说在你正在使用的这个文件中有4列标题为A,B,C和D.例如,每列只有一个随机数列表。您说要删除列B,以及每列的前19行和最后一行。假设这一点,请尝试以下代码(编辑以适合您的文件):

1    import pandas as pd
2    # set a variable to hold the imported csv file
3    hold = pd.read_csv('my_file.csv', header=0)
4    
5    # Now you want to take the relevant columns and add them to lists, so they can be manipulated
6    listA = hold['A']
7    listC = hold['C']
8    listD = hold['D']
9
10    # Now the first 19 rows and last row can be removed. Remember, due to indexing and having a header, the rows are offset by two not one
11    listA = listA[18:-1]
12    listB = listB[18:-1]
13    listC = listC[18:-1]
14
15    # adds each list back into what will become a new, edited, csv file. 
16    # pd.DataFrame is used to convert the lists into DataFrames, which is what pandas uses to write csv files.
17    # the first list cannot be named when it is added to the new list, not to worry this is fixed later on
18    new_file = pd.DataFrame(listA)
19    new_file['C'] = pd.DataFrame(listC)
20    new_file['D'] = pd.DataFrame(listD)
21
22    # this is where we add a name to the first list. When you rename the columns, you have to rename all the columns.
23    new_file.columns = ['A', 'C', 'D']
24
25    # here the new_file is put into a new csv file that has been edited to your specifications. setting index=False just makes it so there isn't an index added to the new csv file.
26    new_file.to_csv('new_file.csv', index=False)

那会做你想要的。如果您有更多列,只需添加更多列表,并确保在命名要添加到每个列表的列时使用相同的大小写。另外需要注意的是,如果您需要更改要删除的行,只需更改第11-13行中的索引