如果我有一个URL <html>
<style>
.table_layout_1 {
font-family: "Trebuchet MS", Arial, Helvetica, sans-serif;
border-collapse: collapse;
}
table.table_layout_1 td, table.table_layout_1 th {
border: 1px solid #ddd;
padding: 2px;
}
table.table_layout_1 tr:nth-child(even){background-color: #f2f2f2;}
table.table_layout_1 tr:hover {background-color: #ddd;}
table.table_layout_1 th {
padding-top: 12px;
padding-bottom: 12px;
text-align: left;
background-color: #4CAF50;
color: white;
}
table.table_layout_1 input {
width: 100%;
padding: 4px 4px;
display: inline-block;
border: 1px solid red
border-radius: 4px;
box-sizing: border-box;
}
img.table_layout_1_remove_col {
display: block;
vertical-align: middle;
margin-left: auto;
margin-right: auto;
width: 32px;
height: 32px;
}
.td_1 { /*New class for position relative*/
position: relative;
text-align: center;
}
img.table_layout_1_add_col {
position: absolute; /*Give position absolute*/
top: calc(50% - 16px); /*top = 50% - 16px for vertically aligning to middle. 16px is used because image's height is 32px.*/
right: -16px; /*right = -16px so that, image is only half inside current <td> and rest half goes to nest <td>*/
width: 32px;
height: 32px;
}
</style>
<body>
<form action="" method="post" name="form_name">
<table id="table_id_1" class="table_layout_1">
<tr>
<td class="td_1">
<img class="table_layout_1_remove_col" src="minus_sign_red_circle.svg" alt="Remove column">
<img class="table_layout_1_add_col" src="plus_sign_green_circle.svg" alt="Add column">
</td>
<td class="td_1">
<img class="table_layout_1_remove_col" src="minus_sign_red_circle.svg" alt="Remove column">
<img class="table_layout_1_add_col" src="plus_sign_green_circle.svg" alt="Add column">
</td>
</tr>
<tr>
<th>Description</th>
<th>Value</th>
</tr>
<tr>
<td><input type="text" name="cell_0_0" value="Row 1"></td>
<td><input type="text" name="cell_0_1" value="1000"></td>
</tr>
<tr>
<td><input type="text" name="cell_1_0" value="Row 2"></td>
<td><input type="text" name="cell_1_1" value="2000"></td>
</tr>
<tr>
<td><input type="text" name="cell_2_0" value="Row 3"></td>
<td><input type="text" name="cell_2_1" value="3000"></td>
</tr>
<tr>
<td><input type="text" name="cell_3_0" value="Row 4"></td>
<td><input type="text" name="cell_3_1" value="4000"></td>
</tr>
</table>
</form>
</body>
</html>
,并且想将所有此类URL与我已经拥有的URL列表分开,那么我该如何在Python中做到这一点?
答案 0 :(得分:0)
看看urlparse
>>> from urllib.parse import urlparse
>>> url = "http://www.test.com:8080/cat1/cat2"
>>> parsed = urlparse(url)
>>> parsed
ParseResult(scheme='http', netloc='www.test.com:8080', path='/cat1/cat2', params='', query='', fragment='')
>>> parsed.path
'/cat1/cat2'
>>> parsed.path.split("/")
['', 'cat1', 'cat2']
如果您注意到上面的URL解析将帮助您分解不需要的内容,从而使字符串处理更加容易。在上面的示例中,您可以看到它清晰地划分了协议,主机和端口,并为您提供了进行操作的路径。另外,如果有任何其他查询参数,也会将其分解。
一旦有了“ path”字符串,就可以像解析任何字符串一样简单地对其进行解析。由于您的路径始终以“ /”开头,因此您可以在字符串拆分中将其忽略
>>> parsed.path.split("/")[1:]
['cat1', 'cat2']
请注意;如果您的URL不包含路径,则可能会失败。如果您需要更多详细信息,则应在问题中提供所需的最终结果