Question

我有一个大约有50行的文本文件，其格式如下：

 immediate     ADC #oper     69    2     2
 absolute      ADC oper      6D    3     4
 etc..

我想要做的是创建6个不同的列表，并将每行中的每个单词添加到单独的列表中，以便输出成为此

addressing: ['immediate', 'absolute']
symbol: ['ADC', 'ADC']
symbol2: ['#oper', 'oper']
opcode: ['69', '6D']
bytes: ['2', '3']
cycles: ['2', '4']

我正在尝试在Python中执行此操作，但目前我的代码无效并将每个单词添加到每个列表中：

addressing: ['immidiate', 'ADC', '#oper', '69', '2', '2', 'absolute', 'ADC', 'oper', '6D', '3', '4',]
symbol: ['immidiate', 'ADC', '#oper', '69', '2', '2', 'absolute', 'ADC', 'oper', '6D', '3', '4',]
symbol2: ['immidiate', 'ADC', '#oper', '69', '2', '2', 'absolute', 'ADC', 'oper', '6D', '3', '4',]
opcode: ['immidiate', 'ADC', '#oper', '69', '2', '2', 'absolute', 'ADC', 'oper', '6D', '3', '4',]
bytes: ['immidiate', 'ADC', '#oper', '69', '2', '2', 'absolute', 'ADC', 'oper', '6D', '3', '4',]
cycles: ['immidiate', 'ADC', '#oper', '69', '2', '2', 'absolute', 'ADC', 'oper', '6D', '3', '4',]

如何更改以下代码以便生成我想要的输出？

addressing = []
symbol = []
symbol2 = []
opcode = []
bytes = []
cycles = []

index = 1;

for line in f:
    for word in line.split():
        if index == 1:
            addressing.append(word)
            index += 1
            print(index)

        if index == 2:
            symbol.append(word)
            index += 1
            print(index)

        if index == 3:
            symbol2.append(word)
            index += 1
            print(index)

        if index == 4:
            opcode.append(word)
            index += 1
            print(index)

        if index == 5:
            bytes.append(word)
            index += 1
            print(index)

        if index == 6:
            cycles.append(word)
            index += 1
            print(index)
        index = 1

Answer 1

有两种方法可以解决这个问题：

假定格式的静态方式永远不会改变，每行的值将具有相同的数量
动态方式，可以灵活地格式化每行的变更和可变数量的项目，假设项目的顺序保持不变。

我将在下面详述两种方式：

静态方式： 拆分该行并使用索引追加

addressing = []
symbol = []
symbol2 = []
opcode = []
bytes = []
cycles = []
for line in f:
    splitted = line.split()
    addressing.append(splitted[0])
    symbol.append(splitted[1])
    symbol2.append(splitted[2])
    opcode.append(splitted[3])
    bytes.append(splitted[4])
    cycles.append(splitted[5])

动态方式：创建字典并迭代键。

information = {}
information['addressing'] = []
information['symbol'] = []
information['symbol2'] = []
information['opcode'] = []
information['bytes'] = []
information['cycles'] = []
key_list = list(information.keys())
for line in f:
    splitted = line.split()
    for i in range(0,len(splitted)):
        information[key_list[i]].append(splitted[i])
print(information)

Answer 2

如前所述，您应该删除所有index += 1语句，并在内部index += 1循环的末尾只留下一个for。或者使用elif的{{1}} intead。if。

另外，请考虑使用enumerate()。无需手动更新索引变量：

# Example use of enumerate()
for line in f:
    for index, word in enumerate(line.split()):
        print(index, word)

Answer 3

您可以使用正则表达式在\s的最长块中分割每一行：

import re
f = [re.split('\s+', i.strip('\n')) for i in open('filename.txt')]
final_data = [{a:list(i)} for a, i in zip(['addressing', 'symbol', 'symbol2', 'opcode', 'bytes', 'cycles'], zip(*f))]

输出：

[{'addressing': ['immediate', 'absolute']}, {'symbol': ['ADC', 'ADC']}, {'symbol2': ['#oper', 'oper']}, {'opcode': ['69', '6D']}, {'bytes': ['2', '3']}, {'cycles': ['2', '4']}]

Answer 4

您可以使用内置的zip函数将数据行转换为列。下面的代码将数据放入元组字典中，字段名称作为键。对于这个演示，我已经将数据嵌入到脚本中，因为这比从文件中读取更简单，但是很容易修改代码以便从文件中读取。

file_data = '''\
immediate     ADC #oper     69    2     2
absolute      ADC oper      6D    3     4
'''.splitlines()

fields = 'addressing', 'symbol', 'symbol2', 'opcode', 'bytes', 'cycles'

values = zip(*[row.split() for row in file_data])
data = dict(zip(fields, values))
for k in fields:
    print(k, data[k])

<强>输出

addressing ('immediate', 'absolute')
symbol ('ADC', 'ADC')
symbol2 ('#oper', 'oper')
opcode ('69', '6D')
bytes ('2', '3')
cycles ('2', '4')

如果你真的想要单独的命名变量，那就更容易了，但是你可以看到它更难以使用。

file_data = '''\
immediate     ADC #oper     69    2     2
absolute      ADC oper      6D    3     4
'''.splitlines()

(addressing, symbol, symbol2, 
opcode, bytecode, cycles) = zip(*[row.split() for row in file_data])

print(addressing)
print(symbol)
print(symbol2)
print(opcode)
print(bytecode)
print(cycles)

<强>输出

('immediate', 'absolute')
('ADC', 'ADC')
('#oper', 'oper')
('69', '6D')
('2', '3')
('2', '4')

Answer 5

问题在于您在每个if块中递增索引。所以在这个块结束时：

if index == 1:
     addressing.append(word)
     index += 1
     print(index)

index的值为2.然后当它命中if index == 2:时评估为True，将该字添加到第二个列表，递增索引，依此类推。

你可以通过将内部for循环更改为for index in range(1,6):并停止手动递增index来解决此问题，但如果您知道每行有6个单词，那么可能会更好完全删除内部for循环并手动将单词分配给数组。

for line in f:
     words = line.split() 
     addressing.append(words[0])
     symbol.append(words[1])
     ...etc

迭代通过txt文件并在Python中将单词添加到单独的列表中

5 个答案: