将另一列添加到输出

时间:2017-05-17 19:03:38

标签: python csv

非常感谢你的时间。很抱歉打扰了我的编程高手,但是我在过去两天的大部分时间都在测试并寻找一种方法来解决我的问题,而我似乎只是缺乏根据我认为的应该是简单的解决方案。一位现在离开城镇的朋友给我写了一个简短的python脚本(下面复制),输入任意数量的.csv文件,这些文件具有不同但重叠的索引和这些索引的值。然后,脚本请求为所有输入文件之间的每个索引选择最大值,并创建包含所有索引的编译.csv,并为每个索引创建最大值。我想在输出.csv中添加第三列,显示每个最高值的源文件名。我认为我的问题与这个https://stackoverflow.com/questions/35005798/adding-another-column-to-a-csv-file-w-python不同,因为索引和文件名信息之间没有一对一的关联 - 所需的文件名输出依赖于最大值。对我来说,文件名的来源是文件名本身还是手动输入到输入.csvs中的第三列预先显示每行的文件名并不重要,如果两个文件有两个文件,输出的文件名也不重要相同索引的值相同。尽管尝试了我能想到的一切,但我没有成功地将这个输出添加到脚本中。非常感谢你!!

import csv
import sys
import operator
import numpy

filenames = [
    "All_Culverts_K.csv",
    "All_Culverts5817.csv",
    "All_culverts_5.2.csv",
    "All_Culverts.csv",
    "All_CulvertsCopy.csv"]

output = "All_Culverts_Run_5.11_Max_Areas3.csv"

maxAreas = [None] * 3000

for filename in filenames:
     try:
        with open(filename, 'r') as csv_file:
            input_table = csv.reader(csv_file)

            # Get rid of header
            header_row = next(input_table)

            row_number = 0

            # Go through all rows in the table after the header.
            for row in input_table:

                try:
                    ws_index = row[0].index('ws')

                    index = int(row[0][:ws_index])
                    value = float(row[1])



                    if (maxAreas[index] == None):
                        maxAreas[index] = value


                    else:
                        if (maxAreas[index] < value) :
                            maxAreas[index] = value

                except ValueError:
                    print "Error, missing ws on row " + str(row_number)



                row_number += 1

        csv_file.close()


   except IOError:
        print "ERROR: Could not find file '" \
            + filename \
            + "'. Bailing out."
        sys.exit(0)

# Write the maximums.
f_out = open(output, 'wb')
csv_writer = csv.writer(f_out)
csv_writer.writerow(['BarrierID', 'Area_sqkm', 'Source_file'])

row_number = 0

for area in maxAreas:
    csv_writer.writerow([str(row_number) + 'ws', area])
    row_number += 1

print "Done! View .csv in folder."

f_out.close()

到目前为止我尝试了什么? - 添加第三列以输入显示sourcefile的.csvs - 创建source_file变量 - 将source_file输入附加到if语句 - 将source_file变量添加到writerow命令 - 大量的谷歌搜索和阅读一些python文档

1 个答案:

答案 0 :(得分:0)

这会解决您的问题吗?

import csv
import sys
import operator
import numpy

filenames = [
    "All_Culverts_K.csv",
    "All_Culverts5817.csv",
    "All_culverts_5.2.csv",
    "All_Culverts.csv",
    "All_CulvertsCopy.csv"]

output = "All_Culverts_Run_5.11_Max_Areas3.csv"

maxAreas = [None] * 3000

for filename in filenames:
     try:
        with open(filename, 'r') as csv_file:
            input_table = csv.reader(csv_file)

            # Get rid of header
            header_row = next(input_table)

            row_number = 0

            # Go through all rows in the table after the header.
            for row in input_table:

                try:
                    ws_index = row[0].index('ws')

                    index = int(row[0][:ws_index])
                    value = float(row[1])


    ##modification nr.1: use keyword is when checking for None
                    if (maxAreas[index] is None):
    ##modification nr.2: store a tuple instead of just the value
                        maxAreas[index] = (value, filename)


                    else:
    ##modification nr.3: use the numerical value in the stored tuple by adding [0]  
                        if (maxAreas[index][0] < value) :
    ##modification nr.4: store a tuple instead of just the value    
                            maxAreas[index] = (value, filename)

                except ValueError:
                    print "Error, missing ws on row " + str(row_number)



                row_number += 1

        csv_file.close()


   except IOError:
        print "ERROR: Could not find file '" \
            + filename \
            + "'. Bailing out."
        sys.exit(0)

# Write the maximums.
f_out = open(output, 'wb')
csv_writer = csv.writer(f_out)
csv_writer.writerow(['BarrierID', 'Area_sqkm', 'Source_file'])

row_number = 0

#modification nr. 5: unpack the tuple into area and filename1 when iterating
#through maxAreas; use filename1 instead of filename to catch possible errors
for area, filename1 in maxAreas:
#modification nr.6: store the additional filename1
    csv_writer.writerow([str(row_number) + 'ws', area, filename1])
    row_number += 1

print "Done! View .csv in folder."

f_out.close()

代码几乎与您的原始代码相同,但在您确定每个索引的最大值的位置,我存储了一个元组(maxArea, filename),以保存信息,在哪个文件中找到最大值。然后最后,我从maxAreas解压缩这两个值,并根据Jean-François Fabre的注释将附加行添加到csv文件中。我不得不承认,我没有csv的经验,所以我有可能完全错了。