如何从参数中提取数据?

时间:2018-02-11 09:44:29

标签: python linux bash urlparse

我有超过一千个链接都有基于GET的参数。

https://us.webuy.com/search/index.php/"><script>prompt(/XSS/)</script>
https://www.densuke.biz/help
http://www.ntrcars.co.uk/email.php?subject=%22%3E%3Csvg/onload=alert(/XSS/)%3E
http://www.americanexpress.com/thailand/en/leave_country.shtml?url=javascript:alert`XSS`
https://share.trin.cam.ac.uk/sites/public/Pages/PageNotFoundError.aspx?FollowSite=0&SiteName='-confirm(/XSS/)-'
http://www.rockwellautomation.com/global/news/the-journal/detail.page?docid=dfb8c8ba15e7cf2c599fc321b8e2b98e&G11N/Locale=en&geography=%22%3E%3Cimg%20src=x%20onerror=prompt%28/XSS/%29%3E&content_type=magazine&pagetitle=\n
https://www.ifishillinois.org/profiles/display_lake.php?waternum=1/*-/*`/*\`/*'/*"/**/--></script><svg/onload=;prompt(/XSS/);>00116
http://tools.xaa.su/htaccess/
http://www.wa.lk/realstate/product_display.php?id=%22%22;%3C%2Fscript%3E%3Cscript%3Eprompt(%2FXSS%2F)%3C%2Fscript%3E%3C%22

我需要提取包含字符串&#39; XSS&#39;的所有数据。并将它们放入列表中。

"><script>prompt(/XSS/)</script>
%22%3E%3Csvg/onload=alert(/XSS/)%3E
javascript:alert`XSS`
'-confirm(/XSS/)-'

依旧......

我试图使用urlparse,但在任何地方都看不到这种类型的功能

#from urllib.parse import urlparse
#
#url = 'http://user:pwd@NetLoc:80/path;param?query=arg#frag'
#parsed = urlparse(url)
#print('scheme  :', parsed.scheme)
#print('netloc  :', parsed.netloc)
#print('path    :', parsed.path)
#print('params  :', parsed.params)
#print('query   :', parsed.query)
#print('fragment:', parsed.fragment)
#print('username:', parsed.username)
#print('password:', parsed.password)
#print('hostname:', parsed.hostname)
#print('port    :', parsed.port)

要清楚;每个URL都有一个我想要提取的javascript有效负载。

1 个答案:

答案 0 :(得分:2)

对于您发布的几乎所有网址(第一个网址除外),您可以从解析js-payload参数中提取query,如下所示:

import urlparse

# file.txt contains the urls - one per line
with open('file.txt', 'r') as f:
    urls = f.read().splitlines()

for url in urls:
    parsed = urlparse.urlparse(url)
    if parsed.query != '':
        print parsed.query

对于第一个,有效载荷包含在path参数中。

使用furl模块提取get参数的另一种方法如下:

from furl import furl
for url in urls:
    i = furl(url)
    if len(i.args):
        for k,v in i.args.items():
            print v

<强>更新 如果您的所有有效负载都包含单词&#39; xss&#39;那么以下内容可能有所帮助:

import urlparse

# file.txt contains the urls - one per line
with open('file.txt', 'r') as f:
    urls = f.read().splitlines()

for url in urls:
    parsed = urlparse.urlparse(url)
    if parsed.query != '':
        print ''.join(filter(lambda i: 'xss' in i.lower() ,parsed.query.split('=')))

<强>输出:

alert(/XSS/)%3E
javascript:alert`XSS`
'-confirm(/XSS/)-'
prompt%28/XSS/%29%3E&content_type
;prompt(/XSS/);>00116
%22%22;%3C%2Fscript%3E%3Cscript%3Eprompt(%2FXSS%2F)%3C%2Fscript%3E%3C%22