Python-将字节/ unicode制表符分隔的数据转换为csv文件

时间:2018-06-28 18:23:02

标签: python python-3.x csv byte ascii

我正在从API中提取以下数据。数据以b前缀开头,该前缀根据Python 3.3 documentation表示我们正在处理“字节文字”,其中转义序列\t\n代表ASCII水平标签(TAB)和ASCII换行符(LF)。

b'settlement-id\tsettlement-start-date\tsettlement-end-date\tdeposit-date\ttotal-amount\tcurrency\ttransaction-type\torder-id\tmerchant-order-id\tadjustment-id\tshipment-id\tmarketplace-name\tamount-type\tamount-description\tamount\tfulfillment-id\tposted-date\tposted-date-time\torder-item-code\tmerchant-order-item-id\tmerchant-adjustment-item-id\tsku\tquantity-purchased\n7293436482\t03.05.2018 09:10:07 UTC\t04.05.2018 20:30:23 UTC\t06.05.2018 20:30:23 UTC\t53,44\tEUR\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n7293436482\t\t\t\t\t\tOrder\t303-3746292-6119509\t\t\tDRGC8lFbB\tAmazon.de\tItemPrice\tPrincipal\t179,99\tMFN\t03.05.2018\t03.05.2018 17:12:22 UTC\t30407746733299\t\t\t3700546702556-180412-chp-18c10347-1\t1\n7293436482\t\t\t\t\t\tOrder\t303-3746292-6119509\t\t\tDRGC8lFbB\tAmazon.de\tItemFees\tCommission\t-32,40\tMFN\t03.05.2018\t03.05.2018 17:12:22 UTC\t30407746733299\t\t\t3700546702556-180412-chp-18c10347-1\t1\n7293436482\t\t\t\t\t\tRefund\t305-1251749-5602732\t305-1251749-5602732\tamzn1:crow:YZkTuxs4RhO8FpZez3cGCg\t\tAmazon.de\tItemPrice\tPrincipal\t-109,99\tAFN\t04.05.2018\t04.05.2018 18:24:39 UTC\t38048998219979\t\t142721169810\t3700546702082-180124-jpn-131N28-6\t\n7293436482\t\t\t\t\t\tRefund\t305-1251749-5602732\t305-1251749-5602732\tamzn1:crow:YZkTuxs4RhO8FpZez3cGCg\t\tAmazon.de\tItemFees\tCommission\t19,80\tAFN\t04.05.2018\t04.05.2018 18:24:39 UTC\t38048998219979\t\t142721169810\t3700546702082-180124-jpn-131N28-6\t\n7293436482\t\t\t\t\t\tRefund\t305-1251749-5602732\t305-1251749-5602732\tamzn1:crow:YZkTuxs4RhO8FpZez3cGCg\t\tAmazon.de\tItemFees\tRefundCommission\t-3,96\tAFN\t04.05.2018\t04.05.2018 18:24:39 UTC\t38048998219979\t\t142721169810\t3700546702082-180124-jpn-131N28-6\t\n'

当我使用.decode("utf-8")将此数据转换为字符串时,我得到了相应的制表符分隔数据:

settlement-id   settlement-start-date   settlement-end-date deposit-date    total-amount    currency    transaction-type    order-id    merchant-order-id   adjustment-id   shipment-id marketplace-name    amount-type amount-description  amount  fulfillment-id  posted-date posted-date-time    order-item-code merchant-order-item-id  merchant-adjustment-item-id sku quantity-purchased
7293436482  03.05.2018 09:10:07 UTC 04.05.2018 20:30:23 UTC 06.05.2018 20:30:23 UTC 53,44   EUR                                                                 
7293436482                      Order   303-3746292-6119509         DRGC8lFbB   Amazon.de   ItemPrice   Principal   179,99  MFN 03.05.2018  03.05.2018 17:12:22 UTC 30407746733299          3700546702556-180412-chp-18c10347-1 1
7293436482                      Order   303-3746292-6119509         DRGC8lFbB   Amazon.de   ItemFees    Commission  -32,40  MFN 03.05.2018  03.05.2018 17:12:22 UTC 30407746733299          3700546702556-180412-chp-18c10347-1 1
7293436482                      Refund  305-1251749-5602732 305-1251749-5602732 amzn1:crow:YZkTuxs4RhO8FpZez3cGCg       Amazon.de   ItemPrice   Principal   -109,99 AFN 04.05.2018  04.05.2018 18:24:39 UTC 38048998219979      142721169810    3700546702082-180124-jpn-131N28-6   
7293436482                      Refund  305-1251749-5602732 305-1251749-5602732 amzn1:crow:YZkTuxs4RhO8FpZez3cGCg       Amazon.de   ItemFees    Commission  19,80   AFN 04.05.2018  04.05.2018 18:24:39 UTC 38048998219979      142721169810    3700546702082-180124-jpn-131N28-6   
7293436482                      Refund  305-1251749-5602732 305-1251749-5602732 amzn1:crow:YZkTuxs4RhO8FpZez3cGCg       Amazon.de   ItemFees    RefundCommission    -3,96   AFN 04.05.2018  04.05.2018 18:24:39 UTC 38048998219979      142721169810    3700546702082-180124-jpn-131N28-6   

但是,我似乎无法将这些数据保存到制表符分隔的csv文件中。我尝试了几种方法将此数据保存到csv文件中,但都失败了,包括以下内容:

with open("folder_GET_V2_SETTLEMENT_REPORT_DATA_FLAT_FILE_V2_/" + grl_id + ".csv", "w") as csv_file:
    writer = csv.writer(csv_file)
    for row in csv_file:
        print(row)

这给了我以下错误:

    for row in csv_file:
io.UnsupportedOperation: not readable

更新: 因此,事实证明问题出在其他地方。在进行各种测试时,我实际上设法生成了与您相同的文件,以为输出看起来不正确,因此无法正常工作。在excel中打开文件时,数据分为两列。

enter image description here

我现在发现原因是使用欧洲表示小数的方式表示一些数字,这是逗号179,99。因此,Excel会将其解释为定界符,而如果我在记事本中打开文件,它将正确读取。

1 个答案:

答案 0 :(得分:1)

好吧,您得到此错误是因为您希望将数据写入csv文件,但是在for循环中,您正尝试从该文件读取数据。如果我理解正确,则希望接收bytes对象,并将其很好地写到制表符分隔的csv文件中。下面的代码可以做到这一点:

import csv, re

orig = b'settlement-id\tsettlement-start-date\tsettlement-end-date\tdeposit-date\ttotal-amount\tcurrency\ttransaction-type\torder-id\tmerchant-order-id\tadjustment-id\tshipment-id\tmarketplace-name\tamount-type\tamount-description\tamount\tfulfillment-id\tposted-date\tposted-date-time\torder-item-code\tmerchant-order-item-id\tmerchant-adjustment-item-id\tsku\tquantity-purchased\n7293436482\t03.05.2018 09:10:07 UTC\t04.05.2018 20:30:23 UTC\t06.05.2018 20:30:23 UTC\t53,44\tEUR\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n7293436482\t\t\t\t\t\tOrder\t303-3746292-6119509\t\t\tDRGC8lFbB\tAmazon.de\tItemPrice\tPrincipal\t179,99\tMFN\t03.05.2018\t03.05.2018 17:12:22 UTC\t30407746733299\t\t\t3700546702556-180412-chp-18c10347-1\t1\n7293436482\t\t\t\t\t\tOrder\t303-3746292-6119509\t\t\tDRGC8lFbB\tAmazon.de\tItemFees\tCommission\t-32,40\tMFN\t03.05.2018\t03.05.2018 17:12:22 UTC\t30407746733299\t\t\t3700546702556-180412-chp-18c10347-1\t1\n7293436482\t\t\t\t\t\tRefund\t305-1251749-5602732\t305-1251749-5602732\tamzn1:crow:YZkTuxs4RhO8FpZez3cGCg\t\tAmazon.de\tItemPrice\tPrincipal\t-109,99\tAFN\t04.05.2018\t04.05.2018 18:24:39 UTC\t38048998219979\t\t142721169810\t3700546702082-180124-jpn-131N28-6\t\n7293436482\t\t\t\t\t\tRefund\t305-1251749-5602732\t305-1251749-5602732\tamzn1:crow:YZkTuxs4RhO8FpZez3cGCg\t\tAmazon.de\tItemFees\tCommission\t19,80\tAFN\t04.05.2018\t04.05.2018 18:24:39 UTC\t38048998219979\t\t142721169810\t3700546702082-180124-jpn-131N28-6\t\n7293436482\t\t\t\t\t\tRefund\t305-1251749-5602732\t305-1251749-5602732\tamzn1:crow:YZkTuxs4RhO8FpZez3cGCg\t\tAmazon.de\tItemFees\tRefundCommission\t-3,96\tAFN\t04.05.2018\t04.05.2018 18:24:39 UTC\t38048998219979\t\t142721169810\t3700546702082-180124-jpn-131N28-6\t\n'

# Split the long string into a list of lines
data = orig.decode('utf-8').splitlines()

# Open the file for writing
with open("tmp.csv", "w") as csv_file:
    # Create the writer object with tab delimiter
    writer = csv.writer(csv_file, delimiter = '\t')
    for line in data:
        # Writerow() needs a list of data to be written, so split at all empty spaces in the line 
        writer.writerow(re.split('\s+',line))