Question

我有一个装满Windows .URL文件的文件夹。我想将它们翻译成我的论文的MLA引用列表。

这是一个很好的Python应用程序吗？如何获取页面标题？我在使用Python 3.1.1的Windows XP上。

Answer 1

这对Python来说非常棒！ .URL文件格式的语法如下：

[InternetShortcut]
URL=http://www.example.com/
OtherStuff=irrelevant

要解析您的.URL文件，请先从ConfigParser开始，然后阅读此文件并制作一个InternetShortcut部分，您可以从中读取该网址。获得网址列表后，您可以使用urllib或urllib2加载网址，并使用哑正则表达式获取网页标题（或Alex建议的BeautifulSoup）。

一旦你有了这个，你有一个URL和页面标题列表......不足以完整的MLA引用，但是应该足以让你入门，不是吗？

像这样的东西（非常粗糙，在SO窗口中编码）：

from glob import glob
from urllib2 import urlopen
from ConfigParser import ConfigParser
from re import search

# I use RE here, you might consider BeautifulSoup because RE can be stupid
TITLE = r"<title>([^<]+)</title>"

result = []
for file in glob("*.url"):
    config = ConfigParser.ConfigParser()
    config.read(file)
    url = config.get("InternetShortcut", "URL")

    # Get the title
    page = urlopen(url).read()
    try: title = search(TITLE, page).groups()[0]
    except: title = "Couldn't find title"

    result.append((url, title))

for url, title in result:
    print "'%s' <%s>" % (title, url)

Answer 2

如果文件包含HTML页面，您可以解析它以提取其标题，并且BeautifulSoup是该作业的推荐第三方库。获取与Python 3.1 here兼容的BeautifulSoup版本，安装它，然后：

将每个文件的内容解析为soup个对象，例如用：
来自BeautifulSoup的
导入BeautifulSoup html = open（'thefile.html'，'r'）。read（）汤= BeautifulSoup（html）
获取title标记（如果有），并打印其字符串内容（如果有）：

title = soup.find（'title'）如果标题是无：打印（'没有标题！'） else：print（'Title：'+ title.string）

如何创建制作MLA引用的脚本？

2 个答案: