Question

My goal is to convert a 6000-record CSV file into an array, clean and normalize it, so that I can read it into a corpus.Dictionary() to use in doc2bow in Gensim to perform a SparseMatrixSimiliarity query. I was successful in reading in the CSV file at first, and it printed out an array I call "definitions" for each one of the 6000 sub-category record numbers.

 f = open('test.csv')
 csv_f = csv.reader(f)
 definitions = []

 for row in csv_f:
    definitions.append(row[2])

 print(definitions)

But then hit a wall with UTF-8 and ASCII errors. Gensim has "strict" UTF-8 settings.

After several hours spent on Stack Overflow, researching, and trying to apply a few "UTF-8" encoders per the Python CSV documentation, I read that since Python 2.7 doesn't have "out of the box" unicode-encoding using the import csv package, that I could use the codecs package.

I figured that instead of finding every line in my original "definitions" 6000-line array and decoding, that I could take an initial stab at decoding it right off the bat using codecs. However, the below code fails to write anything to my definitions array. Being a newbie, I imagine that I may be using codecs the wrong way, and/or closing the wrong way.

 with codecs.open('test.csv', 'rb', encoding='utf-8') as f:    
     csv_f = csv.reader(f)
     definitions= []

     for row in csv_f:   
       definitions.append(np.array((array.float(i) for i in l)))

 f.close()        
 print(definitions)

I am a total newbie, apologies for any errors in my description. Learning as I go, really appreciate any feedback and help. Perhaps I'm going about this the wrong way, and welcome any education. Thank you again.

Empty array after writing to CSV file python

0 个答案: