我创建了一个函数来接收包含这些数据的文本文件:
2012-01-01 09:00 Angel Men's Clothing 214.05 Amex
2012-01-01 09:00 Ben Women's Clothing 153.57 Visa
2012-01-01 09:00 Charlie Music 66.08 Cash
并将其转换为元组列表:
Code : myList = [tuple(j.split("\t")) for j in stringX.split("\n")]
Result:
[('2012-01-01', '09:00', 'Angel', "Men's Clothing", '214.05', 'Amex'),
('2012-01-01', '09:00', 'Ben', "Women's Clothing", '153.57', 'Visa'),
('2012-01-01', '09:00', 'Charlie', 'Music', '66.08', 'Cash')]
进一步将其转换为:
Code: nameList = [(float(item[4]),item[2])for item in myList]
Result: [(214.05, 'Angel'), (153.57, 'Ben'), (66.08, 'Charlie')]
使用那个小尺寸的文本文件,它运行得很好。但我必须转换超过200 MB的大文本文件,超过100万行。它设法转换为元组列表,但它不会进一步转换为较小的元组列表,如上所示。
当我使用Big File运行程序时,它给出了错误:
File "C:\Users\Charlie\Desktop\PYC\PYTHON ASSIGNMENT\test3.py", line 34, in <listcomp>
nameList = [(float(item[4]),item[2])for item in myList]
IndexError: tuple index out of range
答案 0 :(得分:0)
你的元组有一个空条目,这就是为什么你得到“IndexError:元组索引超出范围”
您可以添加if条件来验证元组中是否包含任何值。
<强> EX:强>
myList = [('2012-01-01', '09:00', 'Angel', "Men's Clothing", '214.05', 'Amex'),
('2012-01-01', '09:00', 'Ben', "Women's Clothing", '153.57', 'Visa'),
(),
('2012-01-01', '09:00', 'Charlie', 'Music', '66.08', 'Cash')]
nameList = [(float(item[4]),item[2])for item in myList if item]
print nameList
[(214.05, 'Angel'), (153.57, 'Ben'), (66.08, 'Charlie')]