我试着读取大数据文件.txt并拆分所有的逗号,点等,所以我用Python中的代码读取文件:
<ul>
<li class="navBack" ng-click="navBack()"></li>
<li ng-repeat="tab in tabs" ng-class="{active:isActiveTab(tab.url)}" ng-click="onClickTab(tab)">{{tab.title}}</li>
<li class="navNext" ng-click="navNext()"></li>
</ul>
In Controller:
$scope.index = 0;
$scope.navBack = function(tab) {
if($scope.index > 0)
{
$scope.index--;
}
$scope.currentTab = $scope.tabs[$scope.index].url;
}
$scope.navNext = function() {
if( $scope.index < ($scope.tabs.length-1))
{
$scope.index++;
}
$scope.currentTab = $scope.tabs[$scope.index].url;
}
并打印file= open("file.txt","r")
importantWords =[]
for i in file.readlines():
line = i[:-1].split(" ")
for word in line:
for j in word:
word = re.sub('[\!@#$%^&*-/,.;:]','',word)
word.lower()
if word not in stopwords.words('spanish'):
importantWords.append(word)
print importantWords
。
如何清除['\xef\xbb\xbfdataText1', 'dataText2' .. 'dataTextn']
?我使用的是Python 2.7。
答案 0 :(得分:4)
>>> import codecs
>>> codecs.BOM_UTF8
'\xef\xbb\xbf'
您可以将codecs.open
与encoding='utf-8-sig'
一起使用来跳过BOM序列:
with codecs.open("file.txt", "r", encoding="utf-8-sig") as f:
for line in f:
...
SIDENOTE:不是使用file.readlines
,而是迭代文件。如果您想要的只是遍历文件,file.readlines
将创建不必要的临时列表。