在Python中我想从一个大文件中读取:
def aggregate(file_input):
import fileinput
reviews = []
with open(file_input.replace(".txt", "_aggregated.txt"), "w") as outp:
currComp = ""
outp.write("Business;Stars_In_Sequence")
for line in fileinput.input(file_input):
reviews.append(MyReview(line))
if(currComp != reviews[-1].getCompany()):
currComp = reviews[-1].getCompany()
outp.write("\n" + currComp + ";" + reviews[-1].getStars())
outp.flush()
else:
outp.write(reviews[-1].getStars())
outp.flush()
该文件如下所示:
Business;User;Review_Stars;Date;Length;Votes_Cool;Votes_Funny;Votes_Useful;
0DI8Dt2PJp07XkVvIElIcQ;jkrzTC5P5QGJRoKECzcleQ;5;2014-03-11;421;0;1;0
0DI8Dt2PJp07XkVvIElIcQ;cK78PTjb65kdmRL9BnEdoQ;5;2014-03-29;190;0;1;0
并且如果我只使用文件的一小部分,返回正确的输出,则可以正常工作:
Business;Stars_In_Sequence
Business;R
0DI8Dt2PJp07XkVvIElIcQ;55555455555555515
LTlCaCGZE14GuaUXUGbamg;555555555
EDqCEAGXVGCH4FJXgqtjqg;3324133
但是,如果我使用原始文件,则会返回此内容,但我无法找出原因
Business;Stars_In_Sequence
ÿþB u s i n e s s ;
0 D I 8 D t 2 P J p 0 7 X k V v I E l I c Q ;
L T l C a C G Z E 1 4 G u a U X U G b a m g ;
E D q C E A G X V G C H 4 F J X g q t j q g ;