计算python列表中给出错误答案的字符数

时间:2018-05-17 10:05:51

标签: python-3.x list

我想计算以下列表中的数字字符,但是它给出的答案不正确。这不适用于此特定列表。

words = [b'#Repost@motivated.mindset\n\xe3\x83\xbb\xe3\x83\xbb\xe3\x83\xbb\'']
sum1=sum(len(i) for i in words)
print(sum1)

输出为37

但正确的答案是68

我做错了什么?

1 个答案:

答案 0 :(得分:1)

你有一个字节对象,可能是一个utf-8编码的字符串,你可以计算很多东西:len(word)只是给出了这个数组中的字节数。

但要写入使用不同符号的字节,请为ascii计算一个,如果转义则计算两个,以及使用十六进制表示法计算4个字符。它似乎是一个utf-8编码的字符串,一个字母超过一个字节,所以告诉我你想要计算什么。

# import the module
import bs4 as bs
import urllib.request
import re
import PyPDF2
import pypyodbc
from time import sleep

html ='<li><span class="num">20</span><span class="tmb tmb-xs tmb-artist-xs"><a href="http://www.metrolyrics.com/doremi-maria-and-the-children-lyrics-the-sound-of-music.html"<img alt="The Sound Of Music - Do-Re-Mi lyrics" title="Do-Re-Mi" pagespeed_url_hash="552365003" src="http://img2-ak.lst.fm/i/u/174s/cf8387bbdbfc42ce82844a1cdfec9a33.png"></a></span><span class="song hasvid"><a href="http://www.metrolyrics.com/doremi-maria-and-the-children-lyrics-the-sound-of-music.html#startvideo" class="vid";"></a><a href="http://www.metrolyrics.com/doremi-maria-and-the-children-lyrics-the-sound-of-music.html" class="song-link hasvidtoplyric">Do-Re-Mi Lyrics  </a><span class="artist"><a href="http://www.metrolyrics.com/the-sound-of-music-lyrics.html" class="subtitle" title="The Sound Of Music">The Sound Of Music </a></span></span><div class="last-week up">#21</div></li>'
soup = bs.BeautifulSoup(html,'lxml')


for link in soup.findAll('a', attrs={'href': re.compile("^http://")}):
    temp = link.get('href')
    print(temp)

输出:

word = b'#Repost@motivated.mindset\n\xe3\x83\xbb\xe3\x83\xbb\xe3\x83\xbb\''
index = 0
codechars = 0
for number in word:
    index+=1
    b =  number.to_bytes(1, byteorder='big')
    bs = len(str(b)[2:-1]) #b'' 
    codechars+=bs
    print("%2.0f" % index, repr(b).ljust(10-len(b)), len(b), bs, hex(number), number )

print("Byte count", index )

print(word)
print("code count", codechars )

print(word.decode("utf-8"))
print("utf-8 count", len(word.decode("utf-8")))

assert codechars==len(repr(word[2:-1]))
assert len(word)==index