带索引输出的Find_between函数?

时间:2019-05-22 22:16:44

标签: python

我想使用find_between函数从特定的Web服务器检索可索引的值。

我正在使用requests模块从第18行所示的特定网站收集一些源代码:

response = requests.get("https://www.shodan.io/search?query=Server%3A+SQ-WEBCAM")

,我想调用find_between函数,以使用指定的find_between参数检索所有值(页面上的所有项目,每一项由'n'的增量值表示):< / p>

x = find_between(response.content,'/></a><a href="/host/','">---')

有人知道如何实现这一目标吗?

import sys
import requests
from time import sleep

# Find between page tags on page.
def find_between( s, tag1, tag2 ):
    try:
        start = s.index( tag1 ) + len( tag1 )
        end = s.index( tag2, start )
        return s[start:end]
    except ValueError:
        return ""

def main():
    # Default value for 'n' index value (item on page) is 0
    n = 0

    # Enter the command 'go' to start
    cmd = raw_input("Enter Command: ")
    if cmd == "go":
        print "go!"

        # Go to this page for page item gathering.
        response = requests.get("https://www.shodan.io/search?query=Server%3A+SQ-WEBCAM")

        # Initial source output...
        print response.content

        # Find between value of 'x' sources between two tags
        x = find_between(response.content,'/></a><a href="/host/','">---')
        while(True):

            # Wait one second before continuing...
            sleep(1)
            n = n + 1

            # Display find_between data in 'x'
            print "\nindex: %s\n\n%s\n" % (n, x)

    # Enter 'exit' to exit script
    if cmd == "exit":
        sys.exit()

# Recursive function call
while(True):
    main()

1 个答案:

答案 0 :(得分:1)

代码中的一些内容似乎需要解决:

  1. x的值是在while循环之外(之前)设置的,因此循环会增加索引n的大小,但是会反复打印相同的文本,因为x永不改变。
  2. find_between()仅返回一个匹配项,而您希望所有匹配项。
  3. 您的while循环永远不会结束。

建议:

  1. 将调用放在find_between()循环内的while上。
  2. 您每次连续调用find_between()时,仅将上一次匹配后的文本部分作为文本传递。
  3. while未找到匹配项时退出find_between()循环。

类似这样的东西:

text_to_search = response.content
while(True):
    # Find between value of 'x' sources between two tags
    x = find_between(text_to_search, '/></a><a href="/host/', '">---')
    if not x:
        break

    # Wait one second before continuing...
    sleep(1)

    # Increment 'n' for index value of item on page
    n = n + 1

    # Display find_between data in 'x'
    print "\nindex: %s\n\n%s\n" % (n, x)

    # Remove text already searched
    found_text_pos = text_to_search.index(x) + len(x)
    text_to_search = text_to_search[found_text_pos:]