我从HTML中获取数字,其中一些是%,4位和7位(37.89%,3.464、2,193.813)。我只想保存数字,而不保存百分比,没有千位分隔符(“。”)。
list_of_rows = []
for row in table.findAll('div', attrs={'class': 'quadrado'}):
list_of_cells = []
for cell in row.findAll('span', attrs={'class': 'circulo'}):
text = cell.text
# print(text)
for cell_index in row.findAll('span', attrs={'class': 'triangulo'}):
text_index = cell_index.text
list_of_cells_index = [text, text_index]
list_of_cells_index_clean = ','.join(list_of_cells_index) # remove brackets and ''
# print(list_of_cells_index_clean)
list_of_cells.append(list_of_cells_index_clean)
list_of_rows.append(list_of_cells)
outfile = open("./list.csv", "a")
writer = csv.writer(outfile, lineterminator = '\n')
writer.writerows(list_of_rows)
我想得到:
37.89%, 3464, 2193,813.
我该怎么办?
答案 0 :(得分:1)
我不知道您的所有输入参数,但这对您提供的参数有效。
s = ('37.89%', '3.464', '2,193.813')
for item in s:
remove_comma = item.replace(',', '')
keep_percentage = re.findall(r'\d{1,4}\.\d{1,4}%', remove_comma)
if keep_percentage:
keep_percentage = ''.join(keep_percentage)
print (keep_percentage)
else:
if (len(remove_comma)) == 5:
print (remove_comma.replace('.', ''))
else:
print (remove_comma.replace('.', ','))
**OUTPUTS**
37.89%
3464
2193,813