Question

我有一个文本文件，其中包含类似于以下示例的条目：

# 8 rows of header
---------------------------------------------
123 ABC12345 A some more variable length text
456 DEF12345 A some more variable length text
789 GHI12345 B some more variable length text
987 JKL12345 A some more variable length text
654 MNO12345 B some more variable length text
321 PQR12345 B some more variable length text
etc...

我想要达到的目标是：

将As转换为1，将Bs转换为0以获得二进制数对于上面的例子，这将是110100（即AABABB）
将此二进制数转换为十进制数对于上面的例子，这将是52
将此十进制数映射到文本字符串（即52 =“案例1”或53 =“案例2”等）和
在stdout上打印

我有一点Python经验，但上面的问题超出了我的能力范围。因此，社区的任何帮助将不胜感激。提前谢谢了， Hib

Answer 1

一些指针（假设Python 2）：

翻译字符串：

>>> import string
>>> table = string.maketrans("AB","10")
>>> translated = "AABABB".translate(table)
>>> translated
'110100'

转换为基数10：

>>> int(translated, 2)
52

不知道如何将其映射到那些任意字符串 - 需要更多信息。

打印到stdout - 真的吗？你遇到了哪个部分？

Answer 2

这样的事情应该有效（未经测试）：

from itertools import islice

binary_map = dict(zip("AB", "10"))  # Equivalent to {"A": "1", "B": "0"}
string_map = {52: "Case 1", 53: "Case 2"}

with open("my_text_file") as f:
    binary_str = "".join(binary_map[x.split()[2]] for x in islice(f, 9, None))

binary_value = int(binary_string, 2)
print string_map[binary_value]

我将为您分解缩进的代码行并解释它。

空字符串的join方法将连接参数中给出的字符串，因此"".join(["A", "B", "C"])等于"ABC"。
我们将此方法传递给所谓的generator expression，X for Y in Z。它与list comprehension具有相同的语法，但省略了方括号。
islice函数返回一个迭代器，它以静默方式跳过文件对象f的前9行，因此它会产生从第10行开始的行。
没有参数的split str方法将拆分任何空格字符序列（空格，制表符（"\t"），换行符（"\n"）和回车符（"\r"））并返回一个列表。例如，" a \t b\n\t c\n".split()等于['a', 'b', 'c']。我们对第三列x.split()[2]感兴趣，"A"或"B"。
在binary_map词典中查找此值会改为"1"或"0"。

Answer 3

A.TXT：

# 8 rows of header







123 ABC12345 A some more variable length text
456 DEF12345 A some more variable length text
789 GHI12345 B some more variable length text
987 JKL12345 A some more variable length text
654 MNO12345 B some more variable length text
321 PQR12345 B some more variable length text

你可以试试这个：

>>> int(''.join([line.split(' ')[2] for line in open('a.txt', 'r').readlines()[8:]]).replace('A', '1').replace('B', '0'), 2)
>>> 52

至于将int映射到字符串，不确定你的意思。

>>> value = {int(''.join([line.split(' ')[2] for line in open('a.txt', 'r').readlines()[8:]]).replace('A', '1').replace('B', '0'), 2): 'case 52'}  
>>> value[52]
'case 52'
>>>

Answer 4

我使用re模块来检查要接受的行的格式：

>>> def map_file_to_string(string):
    values = []
    for line in string.split('\n'):
        if re.match(r'\d{3} \w{3}\d{5} [AB] .*', line):
            values.append(1 if line[13] == 'A' else 0)
    return dict_map[int(''.join(map(str, values)), 2)]

>>> dict_map = {52: 'Case 1', 53: 'Case 2'}
>>> s1 = """# 8 rows of header
---------------------------------------------
123 ABC12345 A some more variable length text
456 DEF12345 A some more variable length text
789 GHI12345 B some more variable length text
987 JKL12345 A some more variable length text
654 MNO12345 B some more variable length text
321 PQR12345 B some more variable length text
etc.."""
>>> map_file_to_string(s1)
'Case 1'
>>>

Python：从文本文件中的条目中创建一个十进制数

4 个答案: