Question

我有以下刮刀草稿：

from lxml import html
import requests
import sys

requestedURL = sys.argv[1]
page = requests.get(requestedURL)
tree = html.fromstring(page.content)

passage = ''
for tr in tree.cssselect("div [class='passage-content passage-class-0']"):
    for each in tr:
        for e in each:
            for x in e:
                if x.text_content() == 'Footnotes:' or x.text_content() == 'Cross references:': 
                    passage += '\n'
                    passage = passage.lstrip('\n')
                    sys.stdout.write(passage)
                    sys.exit(0)
                if not x.text_content()[0].isdigit():
                    passage += '\n\n'+x.text_content()+'\n\n'
                else:
                    passage += x.text_content()
            passage = passage.replace('\n\n\n', '\n\n')

当我运行它时，我确实得到了我想要的输出，但我也得到了两个不需要的事件：

打印参数
在我按Enter

示例：

python bg_scrape.py https://www.biblegateway.com/passage/?search=John+3%3A1&version=ESV
[1] 48648

John 3:1

New International Version (NIV)

Jesus Teaches Nicodemus

3 Now there was a Pharisee, a man named Nicodemus who was a member of the Jewish ruling council.

// this line doesn't show up until I hit enter
[1]+  Done  python bg_scrape.py https://www.biblegateway.com/passage/?search=John+3%3A1

值得注意的是，一旦我将requestedURL作为sys.arg而不是代码中的静态字符串，这种情况才会开始发生。

Answer 1

可能是“＆amp;”在cmd行参数中。尝试将参数放在双引号python bg_scrape.py "https://www.biblegateway.com/passage/?search=John+3%3A1&version=ESV"

中

基本上发生的事情是你的shell实际上运行了两件事：

python bg_scrape.py https://www.biblegateway.com/passage/?search=John+3%3A1作为后台流程
然后运行version=ESV，它分配一个shell变量

当你按回车键时，shell只会给你一个已完成的任何后台进程的更新（在这种情况下，你刚开始的那个）。

脚本在执行前打印args并等待我在终止前按[enter]

1 个答案: