我是Python新手! 由于某些原因,我不能在我的环境中使用pandas。所以我自己写了pandas read_table()。基本上我正在转换一个自己使用pandas.read_table的代码。使用必须替换的pandas的代码如下:
import pandas as pd
import numpy as np
import scipy as sp
data_file = pd.read_table(r'records.csv', sep = ';', header=None)
id = np.unique(data_file[0])
tags = np.unique(data_file[1])
number_of_rows = len(id)
number_of_columns = len(tags)
words_indices, letter_indices = {}, {}
for i in range(len(tags)):
words_indices[tags[i]] = i
for i in range(len(id)):
letter_indices[id[i]] = i
#scipy sparse matrix
Vector = sp.lil_matrix((number_of_rows, number_of_columns))
#adds data into the sparse matrix
for line in data_file.values:
u, i , r = map(str,line)
Vector[letter_indices[u], words_indices[i]] = r
csv文件包含大约100种此格式的记录:
REC000034232657,CRC FIX OE Resubmit,0.0073410, 45
现在,我已经通过直接从数据库而不是从.csv文件中读取pandas.read_table来替换pandas.read_table:
def fetch_table(**kwargs):
qry = kwargs['qrystr']
try:
cursor = conn.cursor()
cursor.execute(qry)
all_tuples = cursor.fetchall()
return all_tuples
except pyodbc.ProgrammingError as e:
print ("Exception occured as :", type(e) , e)
# pandas alternate code
total_col = 0
count = 0
dict_csv = {}
stmt = "select * from tickets;"
fetched_rows = fetch_table(qrystr = stmt)
for row in fetched_rows:
total_col = len(row)
break
for i in range(0,total_col):
dict_csv[i] = []
for row in fetched_rows:
for i in range(0,total_col):
dict_csv[i].append(row[i])
# End of pandas alternate code
其余的代码只是继续与早期的代码块相同,除了代替data_file(由pd.read_table()返回),现在我正在使用dict_csv,所以在早期代码中添加数据的for循环将稀疏矩阵更改为:
for line in data_file.values:
u, i , r = map(str,line)
Vector[letter_indices[u], words_indices[i]] = r
但是,我的TypeError低于:
Traceback (most recent call last):
File "C:\Python32\my_scripts\ds.py", line 132, in <module>
for line in dict_csv.values:
TypeError: 'builtin_function_or_method' object is not iterable
我知道dict_csv.values没有返回一个可迭代列表,任何人都可以指出我正在犯的错误。 整数45也是十进制(45),我怎么能摆脱它呢?
非常感谢