Question

该程序生成一个来自文本文件的动词数组。

('a\\xc4\\x9fr\\xc4\\xb1[Verb]+[Pos]+[Imp]+[A2sg]', ':', 17.6044921875)('A\\xc4\\x9fr\\xc4\\xb1[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]', ':', 11.5615234375)

但是文本本身包含十六进制字符，例如

   SELECT [File Number], COUNT(DISTINCT[Enquiry Number]) as Count
   FROM YOURTABLE

我试图摆脱那些。我该怎么做？

我几乎到处查找并且decode（）返回错误（即使在导入编解码器之后）。

Answer 1

您可以使用parse，这是一个python模块，允许您在字符串中搜索常规格式的组件，并且从返回的组件中，您可以提取相应的整数，将其替换为原始字符串。

例如（未经测试的警报！）：

import parse

# Parse all hex-like items
list_of_findings = parse.findall("\\x{:w}", your_string)

# For each item
for hex_item in list_of_findings:

    # Replace the item in the string
    your_string = your_string.replace(
        # Retrieve the value from the Parse Data Format
        hex_item[0],
        # Convert the value parsed to a normal hex string,
        # then to int, then to string again
        str(int("0x"+hex_item[0]))
    )

Obs：而不是＆＃34; int＆＃34;，您可以使用chr将找到的类十六进制值转换为字符，如下所示：

chr(hex_item[0])

无法摆脱十六进制字符

1 个答案: