嘿伙计们,我感到非常沮丧/疲惫,试图修复我的网页中显示的这个unicode代码。我尝试了所有我能想到的东西。这是我的页面的样子,它的数据从news.google.com上抓取文章,并在我的页面上显示时间提交(时间提交是\ u200e随处弹出的地方) http://i.imgur.com/lrqmvWG.jpg
我将提供 views.py ,我的 articles.html (图片中设置为显示所有内容的页面)和 header.html (无论出于何种原因。但这是用于CSS继承的articles.html的父模板)。此外,我研究并知道\ u200e是一个从左到右的标记,当我在news.google.com中检查来源时,它会在时间提交元素中弹出
‎
像这样:
<span class="al-attribution-timestamp">‎51 minutes ago‎</span>
我尝试使用.encode(encoding =&#39; ascii&#39;,&#39;忽略&#39;)或utf-8或iso-8859-8和一对夫妇编辑views.py进行编码我在谷歌上发现了深入研究的其他代码行,但它仍然随处可见。我把它放在我的views.py的许多不同部分,甚至在for循环之后(在它之前+之后+作为数据存储在变量&#34; b&#34;并且它不会消失。)我需要做什么?
Views.py
def articles(request):
""" Grabs the most recent articles from the main news page """
import bs4, requests
list = []
list2 = []
url = 'https://news.google.com/'
r = requests.get(url)
sta = "‎"
try:
r.raise_for_status() == True
except ValueError:
print('Something went wrong.')
soup = bs4.BeautifulSoup(r.text, 'html.parser')
for listarticles in soup.find_all('h2', 'esc-lead-article-title'):
a = listarticles.text
list.append(a)
for articles_times in soup.find_all('span','al-attribution-timestamp'):
b = articles_times.text
list2.append(b)
list = zip(list,list2)
context = {'list':list, 'list2':list2}
return render(request, 'newz/articles.html', context)
articles.html
{% extends "newz/header.html" %}
{% block content %}
<script>
.firstfont (
font-family: serif;
}
</script>
<div class ="row">
<h3 class="btn-primary">These articles are scraped from <strong>news.google.com</strong></h3><br>
<ul class="list-group">
{% for thefinallist in list %}
<div class="col-md-15">
<li class="list-group-item">{{ thefinallist }}
</li>
</div>
{% endfor %}
</div></ul>
{{ list }}
{% endblock %}
<强烈> header.html中
<!DOCTYPE html>
<html lang="en">
<head>
<title>Sacred Page</title>
<meta charset="utf-8" />
{% load staticfiles %}
<meta name="viewport" content = "width=device-width, initial-scale=1.0">
<link rel="stylesheet" href="{% static 'newz/css/bootstrap.min.css' %}" type = "text/css"/>
<style type="text/css">
html,
body {
height:100%
}
</style>
</head>
<body class="body" style="background-color:#EEEDFA">
<div class="container-fluid" style="min-height:95%; ">
<div class="row">
<div class="col-sm-2">
<br>
<center>
<img src="{% static 'newz/img/profile.jpg' %}" class="responsive-img" style='max-height:100px;' alt="face">
</center>
</div>
<div class="col-sm-10">
<br>
<center>
<h3><font color="007385">The sacred database</font></h3>
</center>
</div>
</div><hr>
<div class="row">
<div class="col-sm-2">
<br>
<br>
<!-- Great, til you resize. -->
<!--<div class="well bs-sidebar affix" id="sidebar" style="background-color:#E77200">-->
<div class="well bs-sidebar" id="sidebar" style="background-color:#E1DCF5">
<ul class="nav nav-pills nav-stacked">
<li><a href='/'>Home</a></li>
<li><a href='/newz/'>News database</a></li>
<li><a href='/blog/'>Blog</a></li>
<li><a href='/contact/'>Contact</a></li>
</ul>
</div> <!--well bs-sidebar affix-->
</div> <!--col-sm-2-->
<div class="col-sm-10">
<div class='container-fluid'>
<br><br>
<font color="#2E2C2B">
{% block content %}
{% endblock %}
{% block fool %}
{% endblock fool %}
</font>
</div>
</div>
</div>
</div>
<footer>
<div class="container-fluid" style='margin-left:15px'>
<p><a href="#" target="blank">Contact</a> | <a href="#" target="blank">LinkedIn</a> | <a href="#" target="blank">Twitter</a> | <a href="#" target="blank">Google+</a></p>
</div>
</footer>
</body>
</html>
答案 0 :(得分:1)
如果需要,可以使用replace()
从字符串中删除字符。
b = articles_times.text.replace('\u200E', '')
您在呈现的html而不是\u200E
中看到‎
的原因是您在模板中包含元组{{ thefinallist }}
。这意味着Python在元组上调用repr()
,您会看到\u200E
。它还意味着您可以看到括号,例如('headline' '\u200e1 hour ago')
如果您单独显示元组的元素,那么您将在模板中获得‎
。例如,你可以这样做:
{% for headline, timeago in list %}
<div class="col-md-15">
<li class="list-group-item">{{ headline }} {{ timeago }}
</li>
</div>
{% endfor %}