URL隔离的Python帮助

时间:2018-07-16 17:50:57

标签: python python-3.x

如果我有一个URL <html> <style> .table_layout_1 { font-family: "Trebuchet MS", Arial, Helvetica, sans-serif; border-collapse: collapse; } table.table_layout_1 td, table.table_layout_1 th { border: 1px solid #ddd; padding: 2px; } table.table_layout_1 tr:nth-child(even){background-color: #f2f2f2;} table.table_layout_1 tr:hover {background-color: #ddd;} table.table_layout_1 th { padding-top: 12px; padding-bottom: 12px; text-align: left; background-color: #4CAF50; color: white; } table.table_layout_1 input { width: 100%; padding: 4px 4px; display: inline-block; border: 1px solid red border-radius: 4px; box-sizing: border-box; } img.table_layout_1_remove_col { display: block; vertical-align: middle; margin-left: auto; margin-right: auto; width: 32px; height: 32px; } .td_1 { /*New class for position relative*/ position: relative; text-align: center; } img.table_layout_1_add_col { position: absolute; /*Give position absolute*/ top: calc(50% - 16px); /*top = 50% - 16px for vertically aligning to middle. 16px is used because image's height is 32px.*/ right: -16px; /*right = -16px so that, image is only half inside current <td> and rest half goes to nest <td>*/ width: 32px; height: 32px; } </style> <body> <form action="" method="post" name="form_name"> <table id="table_id_1" class="table_layout_1"> <tr> <td class="td_1"> <img class="table_layout_1_remove_col" src="minus_sign_red_circle.svg" alt="Remove column"> <img class="table_layout_1_add_col" src="plus_sign_green_circle.svg" alt="Add column"> </td> <td class="td_1"> <img class="table_layout_1_remove_col" src="minus_sign_red_circle.svg" alt="Remove column"> <img class="table_layout_1_add_col" src="plus_sign_green_circle.svg" alt="Add column"> </td> </tr> <tr> <th>Description</th> <th>Value</th> </tr> <tr> <td><input type="text" name="cell_0_0" value="Row 1"></td> <td><input type="text" name="cell_0_1" value="1000"></td> </tr> <tr> <td><input type="text" name="cell_1_0" value="Row 2"></td> <td><input type="text" name="cell_1_1" value="2000"></td> </tr> <tr> <td><input type="text" name="cell_2_0" value="Row 3"></td> <td><input type="text" name="cell_2_1" value="3000"></td> </tr> <tr> <td><input type="text" name="cell_3_0" value="Row 4"></td> <td><input type="text" name="cell_3_1" value="4000"></td> </tr> </table> </form> </body> </html>,并且想将所有此类URL与我已经拥有的URL列表分开,那么我该如何在Python中做到这一点?

1 个答案:

答案 0 :(得分:0)

看看urlparse

>>> from urllib.parse import urlparse
>>> url = "http://www.test.com:8080/cat1/cat2"
>>> parsed = urlparse(url)
>>> parsed
ParseResult(scheme='http', netloc='www.test.com:8080', path='/cat1/cat2', params='', query='', fragment='')
>>> parsed.path
'/cat1/cat2'
>>> parsed.path.split("/")
['', 'cat1', 'cat2']

如果您注意到上面的URL解析将帮助您分解不需要的内容,从而使字符串处理更加容易。在上面的示例中,您可以看到它清晰地划分了协议,主机和端口,并为您提供了进行操作的路径。另外,如果有任何其他查询参数,也会将其分解。

一旦有了“ path”字符串,就可以像解析任何字符串一样简单地对其进行解析。由于您的路径始终以“ /”开头,因此您可以在字符串拆分中将其忽略

>>> parsed.path.split("/")[1:]
['cat1', 'cat2']

请注意;如果您的URL不包含路径,则可能会失败。如果您需要更多详细信息,则应在问题中提供所需的最终结果