Python文件未关闭

时间:2013-04-20 13:47:40

标签: python file file-io python-2.7

我写了一个程序,对一些大量数据进行处理。那里涉及三个步骤:

  1. 阅读数据
  2. 处理
  3. 将数据写入文件。
  4. 运行代码时,前两步成功完成。(因为我的笔记本电脑配置数据非常庞大,我在linux中使用交换空间来完成这项工作)。

    现在第三步: 数据已成功写入文件。但我的代码卡在feat.close()行(专长是文件指针)。当我在进程运行时打开文件时,正在写入完整的数据,但我的文件没有关闭。

    代码:

    #!/usr/bin/env python
    
    from __future__ import print_function
    import pickle
    import sys
    import numpy as np
    import posTagsToTriGramFrequency as pt
    import itertools
    import gc
    
    if sys.argv[1] == '-h':
        print("Usage: ./featureSelection <n(authorLimit)> <k(SD weight)> <nGramSize> <folder>")
        sys.exit()
    
    n = int(sys.argv[1])
    k = int(sys.argv[2])
    nGramSize = int(sys.argv[3])
    folder = sys.argv[4]
    
    print('reading features')
    feat = open('../../../../dataDump/'+ folder +'/features.dump','r')
    features = {}
    
    tagsnGram = [tuple(x) for x in itertools.product(pt.getTags(), repeat=nGramSize)]
    
    gramdict = {}
    for gram in tagsnGram:
        gramdict[gram] = []
    
    flag = 1
    author = ''
    for line in feat:
        if(line == '\n'):
            flag = 1
        elif flag == 1:
            author = line.split('/')[-1][:-1]
            print(line, end='')
            features[author] = gramdict.copy()
            flag = 0
        else:
            tagsFreq = iter(line.split())
            for tag in tagsnGram:
                features[author][tag].append(float(tagsFreq.next()))
    
    feat.close()
    
    print('Calculating waht to delete')
    
    nflag = 0
    kflag = 0
    toDel = []
    for tagGram in tagsnGram:
        nflag = 0
        for author in features:
            kflag = 0
            for doc in features[author][tagGram]:
                if doc == 0 : kflag += 1
                if kflag >= k:
                    nflag += 1
                    break
            if nflag >= n:
                toDel.append(tagGram)
                break
    
    for item in toDel:
        for author in features:
            del features[author][item]
    
    f = open('../../../../dataDump/'+ folder +'/tagsInfo.dump','w')
    f.write('k:' + str(k) + ',\t n:' + str(n) + '\n')
    f.write('Deleted tags:\n')
    for item in toDel:
        f.write(str(item) + ' ')
    f.write('\n\nSelected Tags:\n')
    for tagGram in features.itervalues().next():
            f.write(str(tagGram) + ' ')
    f.close()
    
    print("Writing Back Features")
    feat = open('../../../../dataDump/'+ folder +'/selectedFeatures.dump','w')
    for author in features.keys():
        feat.write(author + '\n')
        print(author)
        for tag in features[author]:
            for doc in features[author][tag]:
                feat.write(str(doc) + ' ')
            feat.write('\n')
        feat.write('\n')
        del features[author]
        #gc.collect()
    print('Closing File')
    feat.close()
    

    查看最后一行。我的控制台Closing file正在打印,但之后我的代码被卡住了。

    我的控制台输出:

    abhi@abhi-me~/Projects/workspace/irProject/completepythonbased/authAttrib (irProject)>>./featureSelection.py 35 125 3 3GramFreq
    reading features
    Ajit_Popat
    Mukund_Mehta
    Parajit_Patel
    Priyadarshi
    Kumarpad_Desai
    Bhaven_Kacchi
    Shantibhai_Agrawat
    Pravin_Darji
    Ankit_Trivedi
    Sharad_Rawal
    Tushar_Shukla
    Chandrakant_Mehta
    Jay_Vasavda
    Dolat_Bhatt
    Munindra
    Mrugesh_Vaishnav
    Kulinchandra_Yagnik
    Zaverilal_Mehta
    Priti_Shah
    Vasant_Mistri
    Vatsal_Vasani
    Dinesh_Mistri
    Devesh_Mehta
    Dhaval_Mehta
    Urvish_Kothari
    Madhusudan_Parekh
    Vihari_Chaya
    Virendra_Kapoor
    Mukul_Choksi
    Joravarsinh_Jadav
    Ashok_Dave
    Nasir_Ismaeli
    Joban_Pandit
    Priyakant_Parikh
    Sudarshan_Upadhyay
    Gajendra_Shah
    Altaf_Patel
    Bhalchandra_Jani
    Shashin
    Hansal_Bhachech
    Calculating waht to delete
    Writing Back Features
    Pravin_Darji
    Ajit_Popat
    Kulinchandra_Yagnik
    Sharad_Rawal
    Madhusudan_Parekh
    Shantibhai_Agrawat
    Gajendra_Shah
    Hansal_Bhachech
    Vihari_Chaya
    Virendra_Kapoor
    Sudarshan_Upadhyay
    Priyadarshi
    Tushar_Shukla
    Dolat_Bhatt
    Urvish_Kothari
    Vasant_Mistri
    Mukund_Mehta
    Zaverilal_Mehta
    Kumarpad_Desai
    Vatsal_Vasani
    Bhaven_Kacchi
    Mrugesh_Vaishnav
    Bhalchandra_Jani
    Priyakant_Parikh
    Chandrakant_Mehta
    Mukul_Choksi
    Joravarsinh_Jadav
    Munindra
    Joban_Pandit
    Devesh_Mehta
    Priti_Shah
    Ankit_Trivedi
    Dinesh_Mistri
    Dhaval_Mehta
    Ashok_Dave
    Nasir_Ismaeli
    Parajit_Patel
    Jay_Vasavda
    Altaf_Patel
    Shashin
    Closing File
    ^C
    [1]+  Killed                  ./featureSelection.py 35 125 3 3GramFreq
    

    为什么会这样?

    你可以找到strace here

    修改: 我试图在关闭后打印一些东西,它正在打印,意味着问题在于退出程序。它正在使用大约3 GB的RAM和3 GB的交换空间@Justing。为了填补这个记忆需要10-20分钟,清除这个记忆我等了大约2小时,我认为有一个问题。 我上传了strace,请参阅。

1 个答案:

答案 0 :(得分:2)

尝试使用with,当您完成编写后,它将刷新到磁盘:

with open('file.txt', 'r') as f:
     data = f.read()
f.closed

请参阅python documentation。它应该在编写和读取文件时解决大多数问题。