Question

是否可以使用Javascript创建无效的UTF8字符串？

我发现的每个解决方案都依赖String.fromCharCode生成undefined而不是无效的字符串。我已经看到提到错误是由格式错误的UTF8字符串（即https://developer.mozilla.org/en-US/docs/Web/API/WebSocket#send()）生成的，但我无法弄清楚你将如何创建错误。

Answer 1

JavaScript中的字符串是UTF-16代码单元的计数序列。有一个隐式契约，代码单元代表Unicode代码点。即便如此，也可以表示UTF-16代码单元的任何序列 - 甚至是不成对的代理。

我发现import sys message1=("hangman") l=len(message1) t = 10 c = 0 tries = str(t) correct = str(0) wrongguess=" " print("Hello, you have 10 tries to achieve the answer.") for i in range(0,l): sys.stdout.write('x') i=i+1 print("") for x in range(0,l): message2=input() message3= message2.lower() finder = message1.find(message3) if(int(finder)==-1): print("You Fail") if(int(finder)!=-1): correctguess[int(finder)]=message2 print(correctguess) ## find message 3 within message 1 ## "Find" the input message within the original hangman word; Find the position ## of the "Found" letter within the original message, and replace the "correctguess" ## string's position of the Found letter; The found letter. If not, tries will -1, ## if tries = 0, terminate program ## Create an input loop that will save under a new string. If the input does not equal ## to anything within the original word, add the letter to the "wrongguess" string ##for x in range(0, l-1): ## message2=input() ## message3= message2.lower() ## if message3==message1[x]: ## correctguess[x]= message3 ## print(correctguess) ## if message3!=message1[x]: ## t=t-1 ## if t==0: ## print("You lose") ## ##if message1==message3: ## print("Correct! You had " + tries + " tries left, and had " + correct + " correct") ##返回替换字符，这似乎很合理（而不是String.fromCharCode(0xd801)）。任何文本函数都可能这样做但是，出于效率原因，我确信许多文本操作只会传递无效序列，除非操作需要将它们解释为代码点。

创建此类字符串的最简单方法是使用字符串文字。例如，undefined或"\uD83D \uDEB2"或"\uD83D"而不是有效"\uDEB2"。

"\uD83D\uDEB2"实际上确实会返回"\uD83D \uDEB2".replace(" ","")（"\uD83D\uDEB2"），但我认为你不应指望任何来自不是有效UTF-的字符串的好东西16个Unicode码点编码。

Answer 2

使用JavaScript生成无效的UTF-8字符串的一种方法是获取一个表情符号并删除最后一个字节。

例如，这将是无效的UTF-8字符串：

const invalidUtf8 = '???'.substr(0,5);

Answer 3

根据 this answer ，

UTF-8是Unicode的编码，它可以代表记录的人类历史中曾经存在的每一个字符和字形，因此没有＆＃34;无效＆＃34; UTF-8字符。

所以没有，不可能会创建无效的UTF-8字符。每个字符都是有效的UTF-8。

创建无效的UTF8字符串

3 个答案: