我正在尝试使用python脚本从forvo.com下载定义
我在.txt文档中放了一个单词列表 然后,该脚本使用我购买的API密钥访问Forvo以提取与txt文档中每个单词相关的mp3文件
当txt文档包含单个项目
时,脚本可以正常工作当文档包含> 1项时(由于换行导致格式错误的JSON),它会出现以下错误
File "C:\Python27\Lib\site-packages\ForvoDownloader\test.py", line 105, in <module>
Main('es',1)
File "C:\Python27\Lib\site-packages\ForvoDownloader\test.py", line 75, in Main
r = ForvoRequest(i,lang,APIKEY)
File "C:\Python27\Lib\site-packages\ForvoDownloader\test.py", line 39, in ForvoRequest
data = r.json()
File "C:\Python27\Lib\site-packages\ForvoDownloader\requests\models.py", line 805, in json
return complexjson.loads(self.text, **kwargs)
File "C:\Python27\lib\json\__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "C:\Python27\lib\json\decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python27\lib\json\decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
这是脚本的代码。抱歉这个长度。我是新手,并不知道你们需要帮助的是什么。所以,我谨慎行事。
import forvo
import os
from Tkinter import Tk
from tkFileDialog import askopenfilename
import requests
import urllib
def ForvoRequest(QUERY, LANG, apikey, ACT='word-pronunciations', FORMAT='mp3', free= True):
# action, default is 'word-pronunciations', query, language, apikey, TRUE if free api(default), FALSE if commercial
# Return a list of link to mp3 pronunciations for the word QUERY in LANG language.
# FORMAT='ogg' will return a list of link to ogg pronunciations
if free:#default
base_url = 'http://apifree.forvo.com/'
else:
#TODO: add non free base url
base_url = 'htttp://api.forvo.com/' #is it correct?
query_u8 = QUERY
query_u8.decode('utf-8')
key = [
('action',ACT),
('format','json'),
('word',urllib.quote(QUERY)),
('language',LANG),
('key',apikey)
]
url = base_url + '/'.join(['%s/%s' % a for a in key if a[1]]) + '/'
try:
r = requests.get(url)
except:
raise
return None
data = r.json()
if data[u'items']:
#we retrieved a non empty JSON.
#the JSON is structured like this:
#a dictionary with 2 items, their keys are:
#-u'attributes' (linked to info about the request we made)
#-u'items' (linked to a list of dictionaries)
#in the list there is a dictionary for every pronunciation, we will search for the "mp3path" key
paths = []
for i in data[u'items']:
audioFormat = u'path'+FORMAT
paths.append(i[audioFormat])
return paths
else:
#The json hasn't a u'items' key
return None
def Main(lang,limit):
#APIKEY is stored separately in another file called apikey
with open('apikey.txt') as a:
APIKEY=a.read()
myfile = fileChoose()
with open(myfile) as words:
#We will create a directory to store downloaded mp3, it will be named /home/user/forvo/...
home = os.path.expanduser('~/forvo')
lang_dir = os.path.join(home,lang)
for i in words:
r = ForvoRequest(i,lang,APIKEY)
if r:
DownloadMp3(r, limit, i, lang_dir)
else:
file_name = os.path.join(lang_dir,'word_not_found.txt')
with open(file_name,'a') as out:
out.write(i)
def fileChoose():
#show a file choose dialog box
Tk().withdraw() # we don't want a full GUI, so keep the root window from appearing
filename = askopenfilename() # show an "Open" dialog box and return the path to the selected file
return filename
def DownloadMp3(urlList, limit, word, folder):
#download a mp3 file, rename it and write it in a costum folder
for i in range(0,limit):
mp3 = requests.get(urlList[i])
file_name = word.replace('\n','')+'.{0}'.format(i)+'.mp3'
file_path = os.path.join(folder, file_name)
if not os.path.exists(folder):
os.makedirs(folder)
else:
with open(file_path,"wb") as out:
#we open a new mp3 file and we name it after the word we're downloading.
#The file it's opened in write-binary mode
out.write(mp3.content)
Main('es',2)
似乎我需要找到一种方法将txt文件以格式良好的方式传递给JSON。我尝试在Stackoverflow上搜索示例,但我无法找到解决所有问题的任何内容。
当文本文件包含&gt; 1项时打印r.text: (我认为正在发生的事情是脚本正在不正确地搜索API。而不是循环遍历txt文档中的每个元素,它正在一次搜索所有内容。以下是让我想到这一点的代码部分:
with open(myfile) as words:
#We will create a directory to store downloaded mp3, it will be named /home/user/forvo/...
home = os.path.expanduser('~/forvo')
lang_dir = os.path.join(home,lang)
for i in words:
r = ForvoRequest(i,lang,APIKEY)
if r:
DownloadMp3(r, limit, i, lang_dir)
读出......
<!doctype html>
<html class="no-js" lang="en" dir="ltr">
<head>
<base href="http://forvo.com/" />
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>Forvo - Page not found</title>
<meta name="description" content="The largest pronunciation dictionary in the world. All the words in all the languages pronounced by native speakers" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="language" content="en" />
<meta name="msvalidate.01" content="49A70CD965F0664320B55CB0E75C86EB" />
<meta name="baidu-site-verification" content="6R8984qqM6" />
<meta name="author" content="Forvo Team" />
<meta property="og:image" content="//static00.forvo.com/_presentation/img/forvo_og.png" />
<meta property="og:title" content="Forvo - Page not found" />
<meta property="fb:admins" content="725168915" />
<meta property="fb:app_id" content="186058231446689" />
<meta name="twitter:card" content="app" />
<meta name="twitter:description" content="Forvo pronunciation official app. Learn everywhere." />
<meta name="twitter:app:id:iphone" content="375819093" />
<meta name="twitter:app:id:ipad" content="375819093" />
<meta name="apple-itunes-app" content="app-id=375819093, affiliate-data=at=11lrGv"/>
<link rel="apple-touch-icon" href="forvo.com/apple-touch-icon.png">
<link rel="stylesheet" href="//static00.forvo.com/_presentation/assets/css/vendor/normalize.css?v=54">
<link rel="stylesheet" href="//static00.forvo.com/_presentation/assets/css/main-ltr.css?v=54">
<link rel="start" type="text/html" href="http://forvo.com/" title="Home" />
<script src="//static00.forvo.com/_presentation/assets/js/vendor/modernizr-2.8.3.min.js"></script>
<!-- Google Tag Manager -->
<script type='text/javascript'>
var googletag = googletag || {};
googletag.cmd = googletag.cmd || [];
(function() {
var gads = document.createElement('script');
gads.async = true;
gads.type = 'text/javascript';
var useSSL = 'https:' == document.location.protocol;
gads.src = (useSSL ? 'https:' : 'http:') +
'//www.googletagservices.com/tag/js/gpt.js';
var node = document.getElementsByTagName('script')[0];
node.parentNode.insertBefore(gads, node);
})();
</script>
<script type='text/javascript'>
googletag.cmd.push(function() {
var map_preheader = googletag.sizeMapping().
addSize([0, 0], []).
addSize([1000, 300], [[980, 90], [980, 250], [970, 250], [970, 90]]).
build();
var map_leaderboard = googletag.sizeMapping().
addSize([0, 0], []).
addSize([320, 100], [[320, 100], [320, 50]]).
addSize([720, 100], [468, 60]).
addSize([800, 100], [728, 90]).
addSize([1000, 100], [[728, 90], [970, 90]]).
build();
var map_content = googletag.sizeMapping().
addSize([0, 0], []).
addSize([320, 300], [300, 250]).
addSize([500, 100], [468, 60]).
addSize([800, 100], [728, 90]).
build();
var map_sidebar = googletag.sizeMapping().
addSize([0, 0], []).
addSize([320, 300], [300, 250]).
addSize([640, 1], []).
addSize([960, 100], [[300, 250], [300, 600]]).
build();
var map_footer = googletag.sizeMapping().
addSize([0, 0], []).
addSize([800, 100], [728, 90]).
addSize([1000, 100], [[728, 90], [970, 90]]).
build();
googletag.defineSlot('/7394592/FV_LEADERBOARD', [[320, 100], [320, 50], [468, 60], [728, 90], [970, 90]], 'div-gpt-ad-1435318051797-3').defineSizeMapping(map_leaderboard).addService(googletag.pubads());
googletag.defineSlot('/7394592/FV_CONTENT_1', [[300, 250], [468, 60], [728, 90]], 'div-gpt-ad-1435318051797-0').defineSizeMapping(map_content).addService(googletag.pubads());
googletag.defineSlot('/7394592/FV_CONTENT_2', [[300, 250], [468, 60], [728, 90]], 'div-gpt-ad-1435318051797-1').defineSizeMapping(map_content).addService(googletag.pubads());
googletag.defineSlot('/7394592/FV_SIDEBAR_1', [[300, 250], [300, 600]], 'div-gpt-ad-1435318051797-5').defineSizeMapping(map_sidebar).addService(googletag.pubads());
googletag.defineSlot('/7394592/FV_SIDEBAR_2', [[300, 250], [300, 600]], 'div-gpt-ad-1435318051797-6').defineSizeMapping(map_sidebar).addService(googletag.pubads());
googletag.defineSlot('/7394592/FV_FOOTER', [[728, 90], [970, 90]], 'div-gpt-ad-1435318051797-2').defineSizeMapping(map_footer).addService(googletag.pubads());
googletag.pubads().enableSingleRequest();
googletag.pubads().collapseEmptyDivs(true,true);
googletag.enableServices();
});
</script>
<!-- End Google Tag Manager -->
<script type="text/javascript">
// <![CDATA[
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-11977263-2', 'forvo.com');
ga('require', 'displayfeatures');
ga('send', 'pageview');
//]]>
</script>
</head>
<body class="ltr" >
<div id="wrap">
<div id="player" class="hidden"></div>
<!--[if lt IE 8]>
<p class="browserupgrade">You are using an <strong>outdated</strong> browser. Please <a href="http://browsehappy.com/">upgrade your browser</a> to improve your experience.</p>
<![endif]-->
<header id="header">
<p id="logo"><a href="http://forvo.com/"><img src="//static00.forvo.com/_presentation/assets/img/layout/logo.png" alt="Forvo homepage" /></a></p>
<nav id="nav_user" >
<ul>
<li class="login"><a href="http://forvo.com/login/" rel="nofollow" title="Log in">Log in</a></li>
<li class="signup"><a href="http://forvo.com/signup/" rel="nofollow" title="Sign up">Sign up</a></li>
</ul>
</nav>
<nav id="nav_common">
<ul>
<li class="add"><a href="http://forvo.com/word-add/">Add words</a></li>
<li class="pronounce"><a href="http://forvo.com/pronounce/">Pronounce</a></li>
<li class="listen"><a href="http://forvo.com/listen-learn/">Listen & Learn</a></li>
</ul>
</nav>
</header>
<div id="wrap_main_nav_and_search">
<span class="trigger_menu"> </span>
<nav id="nav_main">
<ul>
<li class="languages "><a href="http://forvo.com/languages/">Languages</a></li>
<li class="categories "><a href="http://forvo.com/tags/" rel="nofollow">Categories</a></li>
<li class="pronunciations "><a href="http://forvo.com/pronounce/">Pronounce</a></li>
<li class="users "><a href="http://forvo.com/users/">Users</a></li>
</ul>
</nav>
<section class="main_search" id="main_search">
<div class="content">
<form method="post" action="/search/" id="search" class="classic"> <!-- "classic" (default) or "new" -->
<nav>
<ul>
<li class="classic"><a id="search_word" href="#search_word_holder" onclick="ga('send', 'event', 'search', 'searchword')">Pronounce</a></li>
<li class="new"><a id="search_language" href="#search_language_holder" onclick="ga('send', 'event', 'search', 'searchtranslate')">Translate + Pronounce BETA</a></li>
</ul>
</nav>
<div class="search_word_holder" id="search_word_holder">
<label for="word_search_header">Search for a word</label>
<input type="text" placeholder="Search for a word" name="word_search" id="word_search_header" value="" />
</div>
<div class="search_language_holder" id="search_language_holder">
<label for="language_search_header">Idiomas</label>
<select name="language_search_header" id="language_search_header" disabled="disabled">
<optgroup label="English">
<option value="en-ar" >English - Arabic</option>
<option value="en-fr" >English - French</option>
<option value="en-de" >English - German</option>
<option value="en-it" >English - Italian</option>
<option value="en-ja" >English - Japanese</option>
<option value="en-ru" >English - Russian</option>
<option value="en-es" >English - Spanish</option>
</optgroup>
<optgroup label="French">
<option value="fr-ar" >French - Arabic</option>
<option value="fr-en" >French - English</option>
<option value="fr-de" >French - German</option>
<option value="fr-it" >French - Italian</option>
<option value="fr-ja" >French - Japanese</option>
<option value="fr-ru" >French - Russian</option>
<option value="fr-es" >French - Spanish</option>
</optgroup>
<optgroup label="Spanish">
<option value="es-ar" >Spanish - Arabic</option>
<option value="es-en" >Spanish - English</option>
<option value="es-fr" >Spanish - French</option>
<option value="es-de" >Spanish - German</option>
<option value="es-it" >Spanish - Italian</option>
<option value="es-ja" >Spanish - Japanese</option>
<option value="es-ru" >Spanish - Russian</option>
</optgroup>
</select>
</div>
<div class="actions">
<button type="submit">Buscar</button>
</div>
</form>
</div>
</section>
</div>
<!-- /7394592/FV_LEADERBOARD -->
<div id='div-gpt-ad-1435318051797-3' class="ad ad-fv-leaderboard">
<script type='text/javascript'>
googletag.cmd.push(function() { googletag.display('div-gpt-ad-1435318051797-3'); });
</script>
</div>
<div id="displayer"><div class="mainpage fullpage page_error404">
<section class="main_section">
<header>
<div class="title_holder">
<h1>404 - Page not found</h1>
</div>
<p class="info">The page you are looking doesn't exist.</p>
</header>
<article>
<h1>Go to <a href="/">Forvo Homepage</a></h1>
<p>If you think it should be something here please <a href="/contact/">contact us</a></p>
</article>
</section>
</div>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-11977263-8', 'auto');
ga('send', 'pageview');
</script>
</div> <!-- /#displayer -->
<!-- /7394592/FV_FOOTER -->
<div id='div-gpt-ad-1435318051797-2' class="ad ad-fv-footer">
<script type='text/javascript'>
googletag.cmd.push(function() { googletag.display('div-gpt-ad-1435318051797-2'); });
</script>
</div>
</div> <!-- #wrap -->
<footer id="footer">
<div class="content">
<section id="footer_interfaz_langs">
<em>Choose your language:</em>
<ul id="footer_main_languages_list">
<li><a href="http://de.forvo.com/404" hreflang="de">Deutsch</a></li>
<li>English</li>
<li><a href="http://es.forvo.com/404" hreflang="es">Español</a></li>
<li><a href="http://fr.forvo.com/404" hreflang="fr">Français</a></li>
<li><a href="http://it.forvo.com/404" hreflang="it">Italiano</a></li>
<li><a href="http://ja.forvo.com/404" hreflang="ja">日本語</a></li>
<li><a href="http://nl.forvo.com/404" hreflang="nl">Nederlands</a></li>
<li><a href="http://pl.forvo.com/404" hreflang="pl">Polski</a></li>
<li><a href="http://pt.forvo.com/404" hreflang="pt">Português</a></li>
<li><a href="http://ru.forvo.com/404" hreflang="ru">Русский</a></li>
<li><a href="http://tr.forvo.com/404" hreflang="tr">Türkçe</a></li>
<li><a href="http://zh.forvo.com/404" hreflang="zh">汉语</a></li>
</ul>
<p id="interfaz_other_languages_trigger"><a href="#footer_other_languages_list">and even more languages</a></p>
<ul id="footer_other_languages_list">
<li><a href="http://ar.forvo.com/404" hreflang="ar">العربية</a></li>
<li><a href="http://bg.forvo.com/404" hreflang="bg">Български</a></li>
<li><a href="http://bs.forvo.com/404" hreflang="bs">Bosanski</a></li>
<li><a href="http://ca.forvo.com/404" hreflang="ca">Català</a></li>
<li><a href="http://cs.forvo.com/404" hreflang="cs">Čeština</a></li>
<li><a href="http://da.forvo.com/404" hreflang="da">Dansk</a></li>
<li><a href="http://el.forvo.com/404" hreflang="el">Ελληνικά</a></li>
<li><a href="http://eu.forvo.com/404" hreflang="eu">Euskara</a></li>
<li><a href="http://fa.forvo.com/404" hreflang="fa">پارسی</a></li>
<li><a href="http://fi.forvo.com/404" hreflang="fi">Suomi</a></li>
<li><a href="http://hak.forvo.com/404" hreflang="hak">客家语</a></li>
<li><a href="http://he.forvo.com/404" hreflang="he">עברית</a></li>
<li><a href="http://hi.forvo.com/404" hreflang="hi">हिन्दी</a></li>
<li><a href="http://hr.forvo.com/404" hreflang="hr">Hrvatski</a></li>
<li><a href="http://hu.forvo.com/404" hreflang="hu">Magyar</a></li>
<li><a href="http://hy.forvo.com/404" hreflang="hy">Հայերեն</a></li>
<li><a href="http://ind.forvo.com/404" hreflang="ind">Bahasa Indonesia</a></li>
<li><a href="http://ko.forvo.com/404" hreflang="ko">한국어</a></li>
<li><a href="http://lv.forvo.com/404" hreflang="lv">Latviešu</a></li>
<li><a href="http://no.forvo.com/404" hreflang="no">Norsk bokmål</a></li>
<li><a href="http://pa.forvo.com/404" hreflang="pa">ਪੰਜਾਬੀ</a></li>
<li><a href="http://ro.forvo.com/404" hreflang="ro">Română</a></li>
<li><a href="http://sk.forvo.com/404" hreflang="sk">Slovenčina</a></li>
<li><a href="http://sr.forvo.com/404" hreflang="sr">Српски / Srpski</a></li>
<li><a href="http://sv.forvo.com/404" hreflang="sv">Svenska</a></li>
<li><a href="http://th.forvo.com/404" hreflang="th">ไทย</a></li>
<li><a href="http://tt.forvo.com/404" hreflang="tt">Tatarça</a></li>
<li><a href="http://uk.forvo.com/404" hreflang="uk">Українська</a></li>
<li><a href="http://vi.forvo.com/404" hreflang="vi">Tiếng Việt</a></li>
<li><a href="http://yue.forvo.com/404" hreflang="yue">粵文</a></li>
</ul>
</section>
<section id="footer_nav">
<em>Forvo, the pronunciation dictionary</em>
<nav>
<ul>
<li><a href="http://pronuncionary.com/">Blog</a>
<li><a href="http://iphone.forvo.com/">iPhone</a>
<li><a href="http://forvo.com/tools/">Tools</a>
<li><a href="http://api.forvo.com/">API</a>
<li><a href="http://forvo.com/license/" rel="nofollow">License</a>
<li><a href="http://forvo.com/privacy/" rel="nofollow">Privacy</a>
<li><a href="http://forvo.com/about/">About Forvo</a>
<li><a href="http://forvo.com/contact/" rel="nofollow">Contact us</a>
<li><a href="http://forvo.com/faq/"><abbr title="Frequently Asked Questions">FAQ</abbr></a>
</ul>
</nav>
<div id="footer_donate">
<form target="_top" method="post" action="https://www.paypal.com/cgi-bin/webscr"><input type="hidden" value="_s-xclick" name="cmd"><input type="hidden" value="GFDPL77XZW3L2" name="hosted_button_id"><input type="image" onclick="ga('send', 'event', 'footer', 'donate')" alt="Donate to Forvo" name="submit" src="https://www.paypalobjects.com/en_US/i/btn/btn_donate_SM.gif"><img width="1" height="1" src="https://www.paypalobjects.com/es_ES/i/scr/pixel.gif" alt=""></form>
</div>
</section>
</div>
</footer>
<script type="text/javascript">
var FRONTEND_PRESENTATION_DIR='/_presentation';
var _SERVER_HOST='forvo.com';
var _AUDIO_HTTP_HOST='audio.forvo.com:80';
var player_path='/_presentation/swf/play.swf';
</script>
<script src="//ajax.googleapis.com/ajax/libs/jquery/2.1.4/jquery.min.js"></script>
<script>window.jQuery || document.write('<script src="//static00.forvo.com/_presentation/assets/js/vendor/jquery-2.1.4.min.js"><\/script>')</script>
<script src="//static00.forvo.com/_presentation/assets/js/vendor/jquery-ui/jquery-ui.min.js?v=54"></script>
<script src="//static00.forvo.com/_presentation/assets/js/vendor/jquery.magnific-popup/jquery.magnific-popup.min.js"></script>
<script src="//static00.forvo.com/_presentation/assets/js/plugins.min.js?v=54"></script>
<script src="//static00.forvo.com/_presentation/assets/js/main.min.js?v=54"></script>
</body>
</html>
答案 0 :(得分:0)
从您的回复文字中,请注意:
<title>Forvo - Page not found</title>
脚本失败,因为网站没有返回JSON,它返回错误;它不知道你在问什么。您需要检查API并了解您是否能够执行您尝试执行的操作。
根据编辑进行更新:
您正在做:
with open(myfile) as words:
for i in words:
# do something with i
当您遍历文件时,它将一次读取一行。我猜测你的文件可能是以空格分隔的(一行上的所有单词都用空格分隔),在这种情况下你需要在空格上分割words
:
with open(myfile) as words:
for i in words.read().split():
# do something with i