我有一堆网址,我想下载文本并做一些进一步的分析。我是一个蟒蛇新手。我有两个问题:(1)我有一个非常奇怪的类型错误; (2)结果未写入数据框。我的代码如下:
smallURL= ['http://www.walesonline.co.uk/business/business-news/more-70-jobs-created-bio-12836127','http://economictimes.indiatimes.com/articleshow/61006825.cms?utm_source=contentofinterest&utm_medium=text&utm_campaign=cppst','http://100seguro.com.ar/telefonica-pone-en-venta-su-aseguradora-antares-vida/','http://13wham.com/news/local/urmc-opens-newest-urgent-care-facility']
import pandas
import datetime
f = open('myfile', 'w')
#lista= ['http://www.walesonline.co.uk/business/business-news/more-70-jobs-created-bio-12836127','http://economictimes.indiatimes.com/articleshow/61006825.cms?utm_source=contentofinterest&utm_medium=text&utm_campaign=cppst','http://100seguro.com.ar/telefonica-pone-en-venta-su-aseguradora-antares-vida/','http://13wham.com/news/local/urmc-opens-newest-urgent-care-facility']
df = pandas.DataFrame(columns=('d', 'datetime', 'title', 'text','keywords', 'url'))
from newspaper import Article
for index in range(len(smallURL)):
#url = "https://www.bloomberg.com/news/articles/2017-11-10/microsoft-and-google-turn-to-ai-to-catch-amazon-in-the-cloud"
article = Article(smallURL[index])
#1 . Download the article
#try:
article.download()
#f.write('article.title+\n')
#except:
#pass
#2. Parse the article
try:
article.parse()
f.write('article.title+\n')
except:
pass
#Print article title
#print(article.title)
article.title
#3. Fetch Author Name(s)
print(article.authors)
#4. Fetch Publication Date
if article.publish_date is None:
d = datetime.datetime.now().date()
else:
d = article.publish_date
#5. Print article text
print(article.text)
#6. Natural Language Processing on Article to fetch Keywords
#article.nlp()
#Print Keywords
print(article.keywords)
#7. Generate Summary of the article
#print(article.url)
print(article.url)
df.loc[index] = [d, datetime.datetime.now().date(), article.title, article.text,article.keywords,article.url]
我的输出包括:
[] http://100seguro.com.ar/telefonica-pone-en-venta-su-aseguradora-antares-vida/ 回溯(最近一次调用最后一次):
文件“”,第1行,in runfile('C:/Users/theiman/Desktop/untitled7.py',wdir ='C:/ Users / theiman / Desktop')
文件“C:\ Users \ theiman \ AppData \ Local \ Continuum \ anaconda3 \ lib \ site-packages \ spyder \ utils \ site \ sitecustomize.py”,第710行,在runfile中 execfile(filename,namespace)
文件“C:\ Users \ theiman \ AppData \ Local \ Continuum \ anaconda3 \ lib \ site-packages \ spyder \ utils \ site \ sitecustomize.py”,第101行,在execfile中 exec(compile(f.read(),filename,'exec'),namespace)
文件“C:/Users/theiman/Desktop/untitled7.py”,第57行,in df.loc [index] = [d,datetime.datetime.now()。date(),article.title,article.text,article.keywords,article.url]
文件“C:\ Users \ theiman \ AppData \ Local \ Continuum \ anaconda3 \ lib \ site-packages \ pandas \ core \ indexing.py”,第179行, setitem self._setitem_with_indexer(indexer,value)
文件“C:\ Users \ theiman \ AppData \ Local \ Continuum \ anaconda3 \ lib \ site-packages \ pandas \ core \ indexing.py”,第425行,在_setitem_with_indexer中 self.obj._data = self.obj.append(value)._ data
文件“C:\ Users \ theiman \ AppData \ Local \ Continuum \ anaconda3 \ lib \ site-packages \ pandas \ core \ frame.py”,第4533行,附加 other = other._convert(datetime = True,timedelta = True)
文件“C:\ Users \ theiman \ AppData \ Local \ Continuum \ anaconda3 \ lib \ site-packages \ pandas \ core \ generic.py”,第3472行,在_convert 复制=复制))。的最终化(个体)
文件“C:\ Users \ theiman \ AppData \ Local \ Continuum \ anaconda3 \ lib \ site-packages \ pandas \ core \ internals.py”,第3227行,在转换中 return self.apply('convert',** kwargs)
文件“C:\ Users \ theiman \ AppData \ Local \ Continuum \ anaconda3 \ lib \ site-packages \ pandas \ core \ internals.py”,第3091行,在申请中 applied = getattr(b,f)(** kwargs)
文件“C:\ Users \ theiman \ AppData \ Local \ Continuum \ anaconda3 \ lib \ site-packages \ pandas \ core \ internals.py”,第1892行,在转换中 values = fn(values.ravel(),** fn_kwargs)
文件“C:\ Users \ theiman \ AppData \ Local \ Continuum \ anaconda3 \ lib \ site-packages \ pandas \ core \ dtypes \ cast.py”,第740行,在soft_convert_objects中 values = lib.maybe_convert_objects(values,convert_datetime = datetime)
文件“pandas / _libs / src \ inference.pyx”,第1204行,pandas._libs.lib.maybe_convert_objects
TypeError:不可用类型:'tzutc'
对于出了什么问题以及如何解决它有任何想法?谢谢!!