除了一些Ruby之外,我的编码背景非常有限,所以如果有更好的方法,请告诉我!
基本上我有一个充满单词的.txt文件。我想导入.txt文件并将其转换为列表。然后,我想获取列表中的第一项,将其分配给变量,并在发送的外部请求中使用该变量来获取单词的定义。返回定义,并将其隐藏到另一个.txt文件中。一旦完成,我希望代码抓住列表中的下一个项目并再次完成所有操作,直到列表用完为止。
以下是我正在进行的代码,以便了解我所处的位置。我还在试图弄清楚如何正确地遍历列表,并且我很难解释文档。
如果已经提出要求,请提前抱歉!我搜索过,但找不到任何具体回答我问题的内容。
from __future__ import print_function
import requests
import urllib2, urllib
from bs4 import BeautifulSoup
lines = []
with open('words.txt') as f:
lines = f.readlines()
for each in lines
wordlist = open('test.txt', 'a')
word = ##figure out how to get items from list and assign them here
url = 'http://services.aonaware.com/DictService/Default.aspx?action=define&dict=wn&query=%s' % word
# print url and make sure it's correct
html = urllib.urlopen(url).read()
# print html (deprecated)
soup = BeautifulSoup(html)
visible_text = soup.find('pre')(text=True)[0]
print(visible_text, file=wordlist)
答案 0 :(得分:1)
将所有内容保持在循环中。像那样:
with open('test.txt', 'a') as wordlist:
for word in lines:
url = 'http://services.aonaware.com/DictService/Default.aspx?action=define&dict=wn&query=%s' % word
print url
# print url and make sure it's correct
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
visible_text = soup.find('pre')(text=True)[0]
wordlist.write("{0}\n".format(visible_text))
其次,一些建议:
f.readlines()
不会丢弃尾随\n
。所以,我会使用f.read().splitlines()
lines = f.read().splitlines()
您不要使用lines
初始化[ ]
列表,因为您只需一次构建列表并将其分配给lines
。只有在考虑将append()
用于列表时,才需要初始化列表。因此,不需要以下行。
lines = []
您可以通过以下方式处理KeyError
:
try:
value = soup.find('pre', text=True)[0]
return value
except KeyError:
return None
答案 1 :(得分:0)
我还展示了如何使用Python requests库来检索原始html页面。这使我们可以轻松检查状态代码是否成功检索。如果您愿意,可以将相关的行替换为urllib。
您可以使用pip:requests
pip install requests
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
import re
import requests
import urllib2, urllib
from bs4 import BeautifulSoup
def get_html_with_urllib(word):
url = "http://services.aonaware.com/DictService/Default.aspx?action=define&dict=wn&query={word}".format(word=word)
html = urllib.urlopen(url).read()
return html
def get_html(word):
url = "http://services.aonaware.com/DictService/Default.aspx?action=define&dict=wn&query={word}".format(word=word)
response = requests.get(url)
# Something bad happened
if response.status_code != 200:
return ""
# Did not get back html
if not response.headers["Content-Type"].startswith("text/html"):
return ""
html = response.content
return html
def format_definitions(raw_definitions_text):
# Get individual lines in definitions text
parts = raw_definitions_text.split('\n')
# Convert to str
# Remove extra spaces on the left.
# Add one space at the end for later joining with next line
parts = map(lambda x: str(x).lstrip() + ' ', parts)
result = []
current = ""
for p in parts:
if re.search("\w*[0-9]+:", p):
# Start of new line. Contains some char followed by <number>:
# Save previous lines
result.append(current.replace('\n', ' '))
# Set start of current line
current = p
else:
# Continue line
current += p
result.append(current)
return '\n'.join(result)
def get_definitions(word):
# Uncomment this to use requests
# html = get_html(word)
# Could not get definition
# if not html:
# return None
html = get_html_with_urllib(word)
soup = BeautifulSoup(html, "html.parser")
# Get block containing definition
definitions = soup.find("pre").get_text()
definitions = format_definitions(definitions)
return definitions
def batch_query(input_filepath):
with open(input_filepath) as infile:
for word in infile:
word = word.strip() # Remove spaces from both ends
definitions = get_definitions(word)
if not definitions:
print("Could not retrieve definitions for {word}".format(word=word))
print("Definition for {word} is: ".format(word=word))
print(definitions)
def main():
input_filepath = sys.argv[1] # Alternatively, change this to file containing words
batch_query(input_filepath)
if __name__ == "__main__":
main()
输出:
Definition for cat is:
cat
n 1: feline mammal usually having thick soft fur and being unable to roar; domestic cats; wildcats [syn: true cat]
2: an informal term for a youth or man; "a nice guy"; "the guy's only doing it for some doll" [syn: guy, hombre, bozo]
3: a spiteful woman gossip; "what a cat she is!"
4: the leaves of the shrub Catha edulis which are chewed like tobacco or used to make tea; has the effect of a euphoric stimulant; "in Yemen kat is used daily by 85% of adults" [syn: kat, khat, qat, quat, Arabian tea, African tea]
5: a whip with nine knotted cords; "British sailors feared the cat" [syn: cat-o'-nine-tails]
6: a large vehicle that is driven by caterpillar tracks; frequently used for moving earth in construction and farm work [syn: Caterpillar]
7: any of several large cats typically able to roar and living in the wild [syn: big cat]
8: a method of examining body organs by scanning them with X rays and using a computer to construct a series of cross-sectional scans along a single axis [syn: computerized tomography, computed tomography, CT, computerized axial tomography, computed axial tomography]
v 1: beat with a cat-o'-nine-tails
2: eject the contents of the stomach through the mouth; "After drinking too much, the students vomited"; "He purged continuously"; "The patient regurgitated the food we gave him last night" [syn: vomit, vomit up, purge, cast, sick, be sick, disgorge, regorge, retch, puke, barf, spew, spue, chuck, upchuck, honk, regurgitate, throw up] [ant: keep down] [also: catting, catted]
Definition for dog is:
dog
n 1: a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds; "the dog barked all night" [syn: domestic dog, Canis familiaris]
2: a dull unattractive unpleasant girl or woman; "she got a reputation as a frump"; "she's a real dog" [syn: frump]
3: informal term for a man; "you lucky dog"
4: someone who is morally reprehensible; "you dirty dog" [syn: cad, bounder, blackguard, hound, heel]
5: a smooth-textured sausage of minced beef or pork usually smoked; often served on a bread roll [syn: frank, frankfurter, hotdog, hot dog, wiener, wienerwurst, weenie]
6: a hinged catch that fits into a notch of a ratchet to move a wheel forward or prevent it from moving backward [syn: pawl, detent, click]
7: metal supports for logs in a fireplace; "the andirons were too hot to touch" [syn: andiron, firedog, dog-iron] v : go after with the intent to catch; "The policeman chased the mugger down the alley"; "the dog chased the rabbit" [syn: chase, chase after, trail, tail, tag, give chase, go after, track] [also: dogging, dogged]
Definition for car is:
car
n 1: 4-wheeled motor vehicle; usually propelled by an internal combustion engine; "he needs a car to get to work" [syn: auto, automobile, machine, motorcar]
2: a wheeled vehicle adapted to the rails of railroad; "three cars had jumped the rails" [syn: railcar, railway car, railroad car]
3: a conveyance for passengers or freight on a cable railway; "they took a cable car to the top of the mountain" [syn: cable car]
4: car suspended from an airship and carrying personnel and cargo and power plant [syn: gondola]
5: where passengers ride up and down; "the car was on the top floor" [syn: elevator car]