我正在编写一个简单的脚本,用于检查谷歌是否存在网站,首先搜索确定的关键字。
现在,这是解析url并返回主机名的函数:
@model FpisNada.Models.Supplier
@{
ViewBag.Title = "Index";
Layout = null;
}
@using (Html.BeginForm())
{
@Html.AntiForgeryToken()
@Html.ValidationSummary(true)
@Html.TextBoxFor(model => model.SupplierID, new { @placeholder = "pib dobavljaca", style = " float:left" })
<div class="col-md-9">
@if (ViewBag.ListTown!= null)
{
@Html.DropDownListFor(m => m.TownID, ViewBag.ListTown as SelectList, "--select town--", new { @class = "form-control", style = " float:left" })
}
@Html.DropDownListFor(m => m.StreetID, new SelectList(""), "--select street--", new { @class = "form-control", style = " float:left" })
<div class="container">
@Html.TextBoxFor(model => model.AdressNumber, new { @class = "form-control"})
@Html.TextBoxFor(model => model.Email, new { @class = "form-control" })
@Html.TextBoxFor(model => model.Name, new { @class = "form-control" })
@Html.TextBoxFor(model => model.Phone, new { @class = "form-control"})
</div>
</div>
<input type="submit" value="Edit" />
}
My controller method:
[HttpGet]
public ActionResult Edit(int id)
{
Supplier supplier= db.Supplier.Find(id);
return View(supplier);
}
[HttpPost]
[ValidateAntiForgeryToken]
public ActionResult Edit( Supplier supplier)
{
try
{
if (ModelState.IsValid)
{
db.Entry(supplier).State = EntityState.Modified;
db.SaveChanges();
return RedirectToAction("ChangeSupplier");
}
}
catch (DataException /* dex */)
{
//Log the error (uncomment dex variable name after DataException and add a line here to write a log.)
ModelState.AddModelError("", "Unable to save changes. Try again, and if the problem persists, see your system administrator.");
}
return View(supplier);
并从以下选择的标签列表开始:
def parse_url(url):
url = urlparse(url)
hostname = url.netloc
return hostname
我写了这个:
linkElems = soup.select('.r a') #in google first page the resulting urls have class r
在最后一行中,在第二行中,我必须从第七个索引开始,因为所有href值都以 for link in linkElems:
l = link.get("href")[7:]
url = parse_url(l)
if "www.example.com" == url:
#do stuff (ex store in a list, etc)
开头。
我正在学习python,所以我想知道是否有更好的方法来做到这一点,或者只是一个替代方法(可能使用正则表达式或替换方法或来自urlparse库)
答案 0 :(得分:0)
您可以使用python lxml 模块执行比 BeautifulSoup 快一个数量级的模块。
这可以这样做:
import requests
from lxml import html
blah_url = "https://www.google.co.in/search?q=blah&oq=blah&aqs=chrome..69i57j0l5.1677j0j4&sourceid=chrome&ie=UTF-8"
r = requests.get(blah_url).content
root = html.fromstring(r)
print(root.xpath('//h3[@class="r"]/a/@href')[0].replace('/url?q=', ''))
print([url.replace('/url?q=', '') for url in root.xpath('//h3[@class="r"]/a/@href')])
这将导致:
http://www.urbandictionary.com/define.php%3Fterm%3Dblah&sa=U&ved=0ahUKEwiyscHQ5_LSAhWFvI8KHctAC0IQFggTMAA&usg=AFQjCNFge5GFNmjpan7S_UCNjos1RP5vBA
['http://www.urbandictionary.com/define.php%3Fterm%3Dblah&sa=U&ved=0ahUKEwiyscHQ5_LSAhWFvI8KHctAC0IQFggTMAA&usg=AFQjCNFge5GFNmjpan7S_UCNjos1RP5vBA', 'http://www.dictionary.com/browse/blah&sa=U&ved=0ahUKEwiyscHQ5_LSAhWFvI8KHctAC0IQFggZMAE&usg=AFQjCNE1UVR3krIQHfEuIzHOeL0ZvB5TFQ', 'http://www.dictionary.com/browse/blah-blah-blah&sa=U&ved=0ahUKEwiyscHQ5_LSAhWFvI8KHctAC0IQFggeMAI&usg=AFQjCNFw8eiSqTzOm65PQGIFEoAz0yMUOA', 'https://en.wikipedia.org/wiki/Blah&sa=U&ved=0ahUKEwiyscHQ5_LSAhWFvI8KHctAC0IQFggjMAM&usg=AFQjCNFxEB8mEjEy6H3YFOaF4ZR1n3iusg', 'https://www.merriam-webster.com/dictionary/blah&sa=U&ved=0ahUKEwiyscHQ5_LSAhWFvI8KHctAC0IQFggpMAQ&usg=AFQjCNHYXX53LmMF-DOzo67S-XPzlg5eCQ', 'https://en.oxforddictionaries.com/definition/blah&sa=U&ved=0ahUKEwiyscHQ5_LSAhWFvI8KHctAC0IQFgguMAU&usg=AFQjCNGlgcUx-BpZe0Hb-39XvmNua2n8UA', 'https://en.wiktionary.org/wiki/blah&sa=U&ved=0ahUKEwiyscHQ5_LSAhWFvI8KHctAC0IQFggzMAY&usg=AFQjCNGc9VmmyQls_rOBOR_lMUnt1j3Flg', 'http://dictionary.cambridge.org/dictionary/english/blah&sa=U&ved=0ahUKEwiyscHQ5_LSAhWFvI8KHctAC0IQFgg5MAc&usg=AFQjCNHJgZR1c6VY_WgFa6Rm-XNbdFJGmA', 'http://www.thesaurus.com/browse/blah&sa=U&ved=0ahUKEwiyscHQ5_LSAhWFvI8KHctAC0IQFgg-MAg&usg=AFQjCNEtnpmKxVJqUR7P1ss4VHnt34f4Kg', 'https://www.youtube.com/watch%3Fv%3D3taEuL4EHAg&sa=U&ved=0ahUKEwiyscHQ5_LSAhWFvI8KHctAC0IQtwIIRTAJ&usg=AFQjCNFnKlMFxHoYAIkl1MCrc_OXjgiClg']