如何用Python解析某些.csv行? (包括示例文件)

时间:2016-02-22 19:31:34

标签: python csv dictionary

我得到了解析.csv文件并将某些行放入列表和/或词典的基础知识,但这个我不能破解。

有9行包含一般信息,如

  • 客户名称
  • 发票号码
  • 发票日期
  • ...等

然后有详细的产品和价格清单。我想做的是:

  1. 从前9行获取'发票#','发布日期','到期日'和'到期金额'
  2. 从其余行中获取“描述”和“金额”
  3. 进入字典。然后我将这些数据写入mySql数据库。有人可以建议如何在这个“标题”(第9行)之后开始向字典中添加项目吗?

    感谢。

    ExampleCSV

    Bill to Client                          
    Billing ID  xxxx-xxxx-xxxx                          
    Invoice number  3359680287                          
    Issue date  1/31/2016                           
    Due Date    3/1/2016                            
    Currency    EUR                         
    Invoice subtotal    2,762,358.40                            
    VAT (0%)    0                           
    Amount due  2,762,358.40                            
    
    Account ID  Account Order   Purchase Order  Product Description Quantity    Units   Amount
    xxx-xxx-xxxx    Client - Search, GDN, Youtube   Client- Google Search       Google AdWords  Belgium_GDN_january_(FR)    1   Impressions 0.04
    xxx-xxx-xxxx    Client - Search, GDN, Youtube   Client- Google Search       Google AdWords  UK_GDN_january  392 Impressions 2.92
    xxx-xxx-xxxx    Client - Search, GDN, Youtube   Client- Google Search       Google AdWords  Poland_GDN_january  12  Impressions 0.05    
    
    xxx-xxx-xxxx    Client - Search, GDN, Youtube   Client      Google AdWords  Switzerland Family vacation 251 Clicks  4,718.91
    xxx-xxx-xxxx    Client - Search, GDN, Youtube   Client      Google              
    xxx-xxx-xxxx    Client - Search, GDN, Youtube   Client      Google AdWords  Invalid activity            -16.46
    

    当我尝试这段代码时:

    import csv
    
    with open('test.csv') as csvfile:
        readCSV = csv.reader(csvfile, delimiter=",")
        for row in readCSV:
            print(row[0])
    

    我在终端得到这个:

      

    比尔到   帐单ID
      发票编号
      发行日期
      截止日期
      货币发票
      小计
      增值税(0%)
      应付金额
      Traceback(最近一次调用最后一次):文件   “xlwings_test.py”,第7行,in       print(row [0])IndexError:列表索引超出范围xlwings git:master❯

3 个答案:

答案 0 :(得分:1)

您可以使用csv模块和enumerate阅读器对象。

import csv

dict1 = {}
dict2 = {}

with open("test.csv", "rb") as f:
    reader = csv.reader(f, delimiter="\t")
    for i, line in enumerate(reader):
        if i in [3, 4, 5, 9]:
            prop_name = line[0]
            prop_val = line[1]
            dict1[prop_name] = prop_value # Invoice number, Issue date, Due date or Amount date
        elif i > 11:
            # Fetch other information like 'description' and 'amount'
            print "Description: " + line[5]
            print "Amount: " + line[-1]
            dict2[line[5]] = line[-1]

print dict1
print dict2

答案 1 :(得分:1)

最简单的解决方案是用逗号分隔列表中的特定行,并从列表的结尾读取数量和描述数据。您可能会收到错误,因为文件中有空白行,您不能拆分它们。这是代码:

import csv

general_info=dict()
rest_of_file_list=[]

row_counter=0
with open('test.csv', 'rb') as file:
reader = csv.reader(file)
    for row in file:
        if row_counter==2:
            #invoice row
            general_info['Invoice number'] = row.split(',')[1].rstrip()
        elif row_counter==3:
            #issue date row
            general_info['Issue date'] = row.split(',')[1].rstrip()
        elif row_counter==4:
            #due date row
            general_info['Due date'] = row.split(',')[1].rstrip()
        elif row_counter==8:
            #amount due row
            general_info['Amount due'] = row.split(',')[1].rstrip()
        elif row_counter > 10:
            #last and 4th item from the end of the list are amount and description
            if row and not row.isspace():
                item=dict()
                lista=row.split(',')

                item['Description']=lista[len(lista)-4].rstrip()
                item['Amount']=lista[len(lista)-1].rstrip()
                rest_of_file_list.append(item)
        row_counter+=1

print(general_info)
print(rest_of_file_list)    

答案 2 :(得分:0)

我建议您分别阅读一般信息,然后使用csv模块作为字符串解析剩余的行。为了第一个目的,我将创建header_attributes字典,其余的将使用csv.DictReader类实例读取。

import csv
from StringIO import StringIO

CLIENT_PROPERTY_LINE_COUNT = 10

f = open("test.csv")

#When reading the file, headers are comma separated in the following format: Property, Value. 
#The if inside the forloop is used to ignore blank lines or lines with only one attribute.
for i in xrange(CLIENT_PROPERTY_LINE_COUNT):
    splitted_line = f.readline().rsplit(",", 2)

    if len(splitted_line) == 2:
        property_name, property_value = splitted_line
        stripped_property_name = property_name.strip()
        stripped_property_value = property_value.strip()
        header_attributes[stripped_property_name] = stripped_property_value

print(header_attributes)
account_data = f.read()

account_data_memory_file = StringIO()
account_data_memory_file.write(account_data)
account_data_memory_file.seek(0)

account_reader = csv.DictReader(account_data_memory_file)

for account in account_reader:
    print(account['Units'], account['Amount']