使用python从csv文件中删除第一个和最后一个字符

时间:2016-07-06 11:14:52

标签: python regex csv pandas

我需要将以下文件转换为output1.csv,这是量子化学计算到单个列中的有效结果:

 Frequencies --    18.8210                44.7624                46.9673
 Frequencies --    66.6706               102.0432               112.4930
 Frequencies --   124.4601               138.4393               180.1404
 Frequencies --   230.0306               240.4389               258.2459
 Frequencies --   282.7781               340.8302               357.7789
 Frequencies --   378.9043               384.1284               401.4285
 Frequencies --   418.0523               444.2264               447.6885
 Frequencies --   473.2391               501.0937               518.9083
 Frequencies --   559.5925               609.9256               623.7729
 Frequencies --   657.4144               672.5480               728.2009
 Frequencies --   740.5035               750.3238               757.2199
 Frequencies --   774.6343               806.7750               815.9990
 Frequencies --   839.3050               858.0716               876.1641
 Frequencies --   888.6654               942.2963               965.7888
 Frequencies --   987.3819               994.7388              1020.8724
 Frequencies --  1025.0426              1045.5129              1059.0966
 Frequencies --  1076.5127              1143.1178              1155.4200
 Frequencies --  1208.6790              1219.7513              1244.7080
 Frequencies --  1265.6108              1287.8830              1300.0463
 Frequencies --  1325.0427              1339.0678              1353.0061
 Frequencies --  1369.0614              1408.5258              1433.0543
 Frequencies --  1452.4148              1454.6319              1500.4304
 Frequencies --  1511.2305              1517.2562              1552.9189
 Frequencies --  1560.5313              1636.2290              1640.1732
 Frequencies --  1664.8747              1681.5566              1703.2026
 Frequencies --  1770.2627              3058.4143              3122.3743
 Frequencies --  3147.1828              3192.5897              3199.1398
 Frequencies --  3211.0676              3222.0033              3236.3394
 Frequencies --  3262.2119              3556.7997              3862.4791

为了达到这个目的,我写了这段代码:

import os
import csv
import re
import sys
import pandas as pd

inputfile = open('output1.csv', 'r')
reader = csv.reader(inputfile)

outputfile = open('output1_f.csv', 'a')
writer = csv.writer(outputfile)

with open('output1_f.csv', 'w') as file:
    file.write('Frequencies,Frequencies,Frequencies\n')
for row in reader:   
    row = [re.sub(' +', ',', item) for item in row]
    row = [re.sub(',Frequencies,--,', '', item) for item in row]               
#    row = map(str.strip, row)
    writer.writerow(row)

inputfile.close()
outputfile.close()

我将代码添加为注释,以便从output1_f.csv文件中的每一行删除第一个和最后一个字符" 。但是它没有用。

 row = map(str.strip, row)

我找到 line.replace解决方案,它会创建第二个output1_2f.csv文件。

inputfile = open('output1_f.csv', 'r')
outputfile = open('output1_2f.csv', 'w')
for line in inputfile:
    line = line.replace('"', '')
    outputfile.write(line)

inputfile.close()
outputfile.close()

以下转置部分仅在删除字符" 时才有效,这就是我需要有效删除" 字符的原因而不是line.replace。

ifile  = open('output1_2f.csv', "rb")
reader = csv.reader(ifile)

with open('output1_transp.csv', 'w') as out:
    rownum = 0
    for row in reader:
    # Save header row.
        if rownum == 0:
            header = row
        else:
            colnum = 0
            for col in row:
                out.write( '%s\n' % (col))
                colnum += 1

        rownum += 1

ifile.close()

如果您能提出任何缩短代码并使其更有效且更易于使用的建议,我将不胜感激。感谢开发人员的时间!!!

1 个答案:

答案 0 :(得分:0)

用户https://codereview.stackexchange.com/users/39848/edward在此帮助

https://codereview.stackexchange.com/questions/134045/optimize-a-simple-and-quick-python-script-for-transposing-a-csv-file/134064#134064

with open('input.txt', 'r') as infile, open('out.csv', 'w') as outfile:
    print >> outfile, "Frequency"
    for line in infile:
        for freq in line.split()[2:]:
            print >> outfile, freq