我想计算以下列表中的数字字符,但是它给出的答案不正确。这不适用于此特定列表。
words = [b'#Repost@motivated.mindset\n\xe3\x83\xbb\xe3\x83\xbb\xe3\x83\xbb\'']
sum1=sum(len(i) for i in words)
print(sum1)
输出为37
但正确的答案是68
我做错了什么?
答案 0 :(得分:1)
你有一个字节对象,可能是一个utf-8编码的字符串,你可以计算很多东西:len(word)只是给出了这个数组中的字节数。
但要写入使用不同符号的字节,请为ascii计算一个,如果转义则计算两个,以及使用十六进制表示法计算4个字符。它似乎是一个utf-8编码的字符串,一个字母超过一个字节,所以告诉我你想要计算什么。
# import the module
import bs4 as bs
import urllib.request
import re
import PyPDF2
import pypyodbc
from time import sleep
html ='<li><span class="num">20</span><span class="tmb tmb-xs tmb-artist-xs"><a href="http://www.metrolyrics.com/doremi-maria-and-the-children-lyrics-the-sound-of-music.html"<img alt="The Sound Of Music - Do-Re-Mi lyrics" title="Do-Re-Mi" pagespeed_url_hash="552365003" src="http://img2-ak.lst.fm/i/u/174s/cf8387bbdbfc42ce82844a1cdfec9a33.png"></a></span><span class="song hasvid"><a href="http://www.metrolyrics.com/doremi-maria-and-the-children-lyrics-the-sound-of-music.html#startvideo" class="vid";"></a><a href="http://www.metrolyrics.com/doremi-maria-and-the-children-lyrics-the-sound-of-music.html" class="song-link hasvidtoplyric">Do-Re-Mi Lyrics </a><span class="artist"><a href="http://www.metrolyrics.com/the-sound-of-music-lyrics.html" class="subtitle" title="The Sound Of Music">The Sound Of Music </a></span></span><div class="last-week up">#21</div></li>'
soup = bs.BeautifulSoup(html,'lxml')
for link in soup.findAll('a', attrs={'href': re.compile("^http://")}):
temp = link.get('href')
print(temp)
输出:
word = b'#Repost@motivated.mindset\n\xe3\x83\xbb\xe3\x83\xbb\xe3\x83\xbb\''
index = 0
codechars = 0
for number in word:
index+=1
b = number.to_bytes(1, byteorder='big')
bs = len(str(b)[2:-1]) #b''
codechars+=bs
print("%2.0f" % index, repr(b).ljust(10-len(b)), len(b), bs, hex(number), number )
print("Byte count", index )
print(word)
print("code count", codechars )
print(word.decode("utf-8"))
print("utf-8 count", len(word.decode("utf-8")))
assert codechars==len(repr(word[2:-1]))
assert len(word)==index