使用dictreader

时间:2016-05-22 00:19:05

标签: python csv header

我正在寻找使用dictreader / dictwriter重命名标头的最佳方法,以添加到我已完成的其他步骤。

这就是我要对下面的源数据示例所做的。

  1. 删除前两行
  2. 将列(标题和数据)重新排序为2,1,3与源文件
  3. 将标题重命名为ASXCode,CompanyName,GISC
  4. 当我在

    如果我使用'reader = csv.reader.inf',则删除第一行并重新排序列,但正如预期的那样,没有标题重命名

    当我运行dictreader行'reader = csv.DictReader(inf,fieldnames =('ASXCode','CompanyName','GICS'))时,我收到错误'dict包含不在fieldnames中的字段:'和显示第一行数据而不是标题。

    我有点困惑于如何解决这个问题,所以任何提示都会受到赞赏。

    源数据示例

    ASX listed companies as at Mon May 16 17:01:04 EST 2016     
    
    Company name    ASX code    GICS industry group
    1-PAGE LIMITED  1PG Software & Services
    1300 SMILES LIMITED ONT Health Care Equipment & Services
    1ST AVAILABLE LTD   1ST Health Care Equipment & Services
    

    我的代码

    import csv
    import urllib.request
    from itertools import islice
    
    local_filename = "C:\\myfile.csv"
    url = ('http://mysite/afile.csv')
    
    temp_filename, headers = urllib.request.urlretrieve(url)
    
    with open(temp_filename, 'r', newline='') as inf, \
            open(local_filename, 'w', newline='') as outf:
    
      #  reader = csv.DictReader(inf, fieldnames=('ASXCode', 'CompanyName', 'GICS'))
        reader = csv.reader(inf)
        fieldnames = ['ASX code', 'Company name', 'GICS industry group']  
        writer = csv.DictWriter(outf, fieldnames=fieldnames)
    
    # 1. Remove top 2 rows
        next(islice(reader, 2, 2), None)
    
    # 2. Reorder Columns
        writer.writeheader()  
        for row in csv.DictReader(inf):
            writer.writerow(row)        
    

2 个答案:

答案 0 :(得分:1)

此处的IIUC是使用pandas及其函数read_csv的解决方案:

import pandas as pd
#Considering that you have your data in a file called 'stock.txt' 
#and it is tab separated, by default the blank lines are not read by read_csv, 
#hence set the header=1
df = pd.read_csv('stock.txt', sep='\t',header=1)
#Rename the columns as required
df.columns= ['CompanyName', 'ASXCode', 'GICS']
#Reorder the columns as required
df = df[['ASXCode','CompanyName','GICS']]

这就是你在ipython中的表现,输出结果如下: enter image description here

答案 1 :(得分:1)

根据你的提示我最终得到了它。我之前没有使用过熊猫,所以必须先准备好一点熊猫。

我最终计算出pandas使用了一个数据框,所以我不得不用tocsv函数做一些不同的事情,并最终将index = False参数添加到tocsv函数中以删除df索引。

现在一切都很棒。

import csv
import os
import urllib.request
import pandas as pd

local_filename = "C:\\myfile.csv"

url = ('http://mysite/afile.csv')

temp_filename, headers = urllib.request.urlretrieve(url)

#using pandas dataframe
df = pd.read_csv(temp_filename, sep=',',header=1) #skip header
df.columns = ['CompanyName', 'ASXCode', 'GICS'] #rename columns
df = df[['ASXCode','CompanyName','GICS']] #reorder columns

df.to_csv(local_filename, sep=',', index=False)
os.remove(temp_filename)  # clean up