我有一个python项目,它会抓取一些* .txt文件,并将每个* .txt文件中的某些单词带到一个新文件中。
这个项目在我的第一个设备上运行良好,并且输出的是一个.txt文件,其中包含正确的内容;但是在我的第二台设备上,它可以运行并且没有错误,但是会创建一个空的.txt文件。
python版本是相同的。都有Windows 10。
代码如下:
import re
#pattern to find
pattern_name_start=r'id="p-name">'
pattern_name_end=r'</div>'
crawlfile=open("product-name.txt","w")
for j in range(10):
#creating file locations and assigning it to $address
address="pages/{0}.txt".format(j)
#opening webpage file which is saved in .txt format and reading its content
pagesfile=open(address,"r")
pagetext=pagesfile.read()
#establishing first character location of the iran-code and generating gs1 code and writing it in the file
pn=""
product_name=""
matchname=re.search(pattern_name_start,pagetext)
if matchname:
strtchar=matchname.start()
#49 is the number of id="p-name characters + number of spaces
for i in range (49,350):
pn=pn+pagetext[strtchar+i]
matchnameend=re.search(pattern_name_end,pn)
if matchnameend:
endchar=matchnameend.start()
#32 is the number of spaces
for i in range(endchar-33):
product_name=product_name+pn[i]
crawlfile.write(product_name+ '\n')
pagesfile.close()
crawlfile.close()