我有一个推文的文本文件,如下所示:
1 1 Sweet United Nations video. Just in time for Christmas. #imagine #NoReligion
2 1 @mrdahl87 We are rumored to have talked to Erv..that's hardly nothing ;)
3 1 Hey there! Nice to see you Winter Weather
4 0 3 episodes left I'm dying over here
我有这段代码:
import csv
with open('./data/train.txt',encoding="utf8") as inf:
reader = csv.reader(inf, delimiter='\t')
col1 = list(zip(*reader))[0]
c = 0
for x in col1:
c = c+1
print(x , " " , c)
当我打印我的列表长度时显示3817,但实际的项目数是3834 !! 我添加了一个计数器“C”来检查和计数过程,它也给了我3817 !!
我通过打印行手动检查了文件:
file_lines counter_c
1643 1643
1644 1644
1645 1645
1649 1646 <-----
1650 1647
我发现文件阅读器跳过了一些行,如1646,1647,1648 !!
他们就是这些:
1645 0 "@SchmidtSTL: Thanks to @automaticg. I think I backed into the playoffs! Playoff matchup: TacoCorp v Gronkey Punch
1646 0 oh yeah, its official #im #crazy htt.co/bgcLDJQIR6
1647 1 Oh well, looks like we are back to square 1. Batie and Bridge. This is going to go so well #BoldandBeautiful
1648 1 "@antoineraps: @edifyin how are you up now? Ho!"
有什么问题?!
编辑(添加推文1645)
我发现推文1645有问题!它是什么? 或者我怎样才能在阅读文本时解决它?