pandas:如何在同一工作表中的现有xlsx文件中写入数据而不覆盖旧数据

时间:2017-08-01 19:07:39

标签: python pandas csv readfile

我现在有一个大的csv文件(18GB),我想以块的形式阅读它然后处理它。

我有两个问题:

  1. 如何检查最后一个块是否包含NaN,因为csv文件的总长度不能通过chunksize划分为整数

  2. 如何在不覆盖旧数据的情况下将新数据写入此现有xlsx文件。

  3. 以下是代码:

    chunkSize=6666800
    periode=333340
    for chunk in pd.read_csv('/Users/gaoyingqiang/Desktop/D970_Leistung.csv',delimiter=';',encoding='gbk',iterator=True,chunksize=chunkSize):
        U1=chunk['Kanal 1-1 [V]']
        I1=chunk['Kanal 1-2 [V]']
        c=[]
        if chunk.isnull.values.any():
            break #here I tried to check the last chunk whether it contains NaN or 0 by check the last elements in U1 to avoid the ZeroDivisionError. But the error was like AttributeError: 'function' object has no attribute 'values'
        for num in range(0,chunkSize, periode):
            lu = sum(U1[num:num + periode] * U1[num:num + periode]) / periode
            li = sum(I1[num:num + periode] * I1[num:num + periode]) / periode
            lui = sum(I1[num:num + periode] * U1[num:num + periode]) / periode
            c.append(180 * mt.acos(2 * lui / mt.sqrt(4 * lu * li)) / np.pi)
            lu = 0
            li = 0
            lui = 0
    
    book=load_workbook('/Users/gaoyingqiang/Desktop/Phaseverschiebung_1.xlsx')
    writer=pd.ExcelWriter('/Users/gaoyingqiang/Desktop/Phaseverschiebung_1.xlsx',engine='openpyxl')
    writer.book=book
    writer.sheets=dict((ws.title,ws) for ws in book.worksheets)
    
    phase = pd.DataFrame(c)
    phase.to_excel(writer,'Main')
    writer.save() #I found it keeps overwriting.
    

    以下是数据的结构: enter image description here

    if chunk.isnull.values.any()

    出错了

    enter image description here

    如果我不做这个NaN检查,那么 enter image description here

    那么哪里出错?

1 个答案:

答案 0 :(得分:0)

如果您想将所有块写入同一张表,请尝试以下代码:

定义一个变量rowLength,对于第一个chunk,rowLength应为零,并通过chunksize增加该值

rowLength = 0                        #for 1st chunk
rowLength = rowLength + chunksize

然后通过指定startrow

将块写入excel
phase = pd.DataFrame(c)
phase.to_excel(writer,'Main', startrow=rowLength, index=False)

pandas to_excel documentation供您参考。