Python:将整个字符串作为一个元素抓取

时间:2015-02-11 02:30:14

标签: python

我的输入如下:

 MSG1          .STRINGZ   “This is my sample string : "
 MEMORYSPACE   .BLKW      9
 NEWLINE       .FILL      #10
 NEG48         .FILl      #-48

      .END

现在我有一个代码,可以通过单词分割输入文件中的每一行:

['MSG1', '.STRINGZ', '"This', 'is', 'a' , 'sample' , 'string','"']
['MEMORYSPACE', '.BLKW', '9']
['NEWLINE', '.FILL', '#10']
['NEG48', '.FILl', '#-48']
[]
['.END']

在输入文件中,在我的第一行,我有字符串,我希望它将整个字符串视为一个元素,以便我可以在我的代码中计算它的长度。有没有办法做到这一点?这是我的代码:

f = open ('testLC31.txt', 'r')
line_count = 0

to_ignore = ["AND", "ADD", "LEA", "PUTS", "JSR", "LD", "JSRR" , "NOT", "LDI" ,
            "LDR", "ST", "STI", "STR", "BR" , "JMP", "TRAP" , "JMP", "RTI" ,
            "BR", "ST", "STI" , "STR" , "BRz", "BRn" , "HALT"]

label = []
instructions = []

for line in f:
    elem = line.split() if line.split() else ['']
    if len(elem) > 1 and elem[0] not in to_ignore:
        label.append(elem[0])
        instructions.append(elem[1])
        line_count += 1
    elif elem[0] in to_ignore:
        line_count += 1

5 个答案:

答案 0 :(得分:1)

str.split方法有一个可选参数maxsplit,它限制了结果列表中元素的数量:

>>> 'MSG1          .STRINGZ   “This is my sample string : "'.split(None, 2)
['MSG1', '.STRINGZ', '“This is my sample string : "']

如果你想要的东西比获得前两个单词更复杂并且保留其余单词,shlex.split可能适合你。它使用类似shell的语法来拆分字符串的一部分,从而将引号中的字符串视为单个元素。您可以通过创建shlex对象实例并更改其属性来精确设置格式。有关详细信息,请参阅文档。

>>> shlex.split('MSG1          .STRINGZ   "This is my sample string : "')
['MSG1', '.STRINGZ', 'This is my sample string : ']
>>> shlex.split('MSG1          .STRINGZ   "This is my sample string : "', posix=False)
['MSG1', '.STRINGZ', '"This is my sample string : "']

如果这还不够,剩下的选择是为您的格式编写一个完整的解析器,例如使用pyparsing库。

答案 1 :(得分:1)

您可以尝试这种粗略的方法来手动连接这些字符串,如下所示:

tags = ['MSG1', '.STRINGZ', '"This', 'is', 'a' , 'sample' , 'string','"']
FirstOccurance = 0
longtag = ""
for tag in tags:
    if FirstOccurance == 1:
        if tag == "\"":
            longtag += tag
        else:
            longtag += " "+tag
    if ("\"" in tag)  and (FirstOccurance == 0):
        longtag += tag
        FirstOccurance = 1
    elif ("\"" in tag) and (FirstOccurance == 1):
        FirstOccurance = 0

print longtag

希望这有帮助。

答案 2 :(得分:0)

这可以通过假设.STRINGZ在表示字符串时总是在一行上。

结果

"这是我的示例字符串:" len(strinz_):32

text_ = """
MSG1          .STRINGZ   "This is my sample string : "
MEMORYSPACE   .BLKW      9
NEWLINE       .FILL      #10
NEG48         .FILl      #-48

      .END
"""

STRINGZ_ = '.STRINGZ'
line_count_ = 0

lines_ = text_.split('\n')

to_ignore = ["AND", "ADD", "LEA", "PUTS", "JSR", "LD", "JSRR" , "NOT", "LDI" ,
            "LDR", "ST", "STI", "STR", "BR" , "JMP", "TRAP" , "JMP", "RTI" ,
            "BR", "ST", "STI" , "STR" , "BRz", "BRn" , "HALT"]

label = []
instructions = []

for line in lines_:
    if STRINGZ_ in line:
        stringz_ = line.split(STRINGZ_)[1]
        print stringz_
        print 'len(stringz_): ' + str(len(stringz_))
    elem = line.split() if line.split() else ['']
    if len(elem) > 1 and elem[0] not in to_ignore:
        label.append(elem[0])
        instructions.append(elem[1])
        line_count_ += 1
    elif elem[0] in to_ignore:
        line_count_ += 1

答案 3 :(得分:0)

with open("filename") as f:
    rd = f.readlines()
    print (rd[0].split("\n")[0].split())

拆分\n和空格。打印每个列表的第一个元素。 readlines()会返回一个列表,操作起来会更容易。另外with open()方法更好。

答案 4 :(得分:0)

一个简单的汇编程序?这是使用pyparsing的粗略传递:

code = """
 MSG1          .STRINGZ   "This is my sample string : "
 MEMORYSPACE   .BLKW      9
 NEWLINE       .FILL      #10
 NEG48         .FILL      #-48

      .END"""

from pyparsing import Word, alphas, alphanums, Regex, Combine, quotedString, Optional

identifier = Word(alphas, alphanums+'_')
command = Word('.', alphanums)

integer = Regex(r'[+-]?\d+')
byte_literal = Combine('#' + integer)
command_arg = quotedString | integer | byte_literal
codeline = Optional(identifier)("label") + command("instruction") + Optional(command_arg("arg"))

for line in code.splitlines():
    line = line.strip()
    if not line:
        continue

    print line
    assemline = codeline.parseString(line)
    print assemline.dump()
    print

打印

MSG1          .STRINGZ   "This is my sample string : "
['MSG1', '.STRINGZ', '"This is my sample string : "']
- arg: "This is my sample string : "
- instruction: .STRINGZ
- label: MSG1

MEMORYSPACE   .BLKW      9
['MEMORYSPACE', '.BLKW', '9']
- arg: 9
- instruction: .BLKW
- label: MEMORYSPACE

NEWLINE       .FILL      #10
['NEWLINE', '.FILL', '#10']
- arg: #10
- instruction: .FILL
- label: NEWLINE

NEG48         .FILL      #-48
['NEG48', '.FILL', '#-48']
- arg: #-48
- instruction: .FILL
- label: NEG48

.END
['.END']
- instruction: .END