Question

我有一个第2行有序列的文件和名为tokenizer的变量，它给我一个旧的位置值。我试图找到新的位置..例如，这条线的标记器给我位置12，这是E只计算字母直到12。所以我需要通过计算破折号来找出新的位置......

--------------- LL --- NE - HVKTHTEEK --- PF-ICTVCR-KS ----------

这是我到目前为止它仍然无效。

with open(filename) as f:
    countletter = 0
    countdash = 0
    for line, line2 in itertools.izip_longest(f, f, fillvalue=''):
        tokenizer=line.split()[4]
        print tokenizer

        for i,character in enumerate(line2):

            for countletter <= tokenizer:

                if character != '-': 
                    countletter += 1
                if character == '-':
                    countdash +=1

此示例的新职位应为32

Answer 1

第一个答案，由Chad D编辑，使其成为1索引（但不正确）：

def get_new_index(string, char_index):
    chars = 0
    for i, char in enumerate(string):
        if char != '-':
            chars += 1
        if char_index == chars:
            return i+1

重写版本：

import re

def get(st, char_index):
    chars = -1
    for i, char in enumerate(st):
        if char != '-':
            chars += 1
        if char_index == chars:
            return i

def test():
    st = '---------------LL---NE--HVKTHTEEK---PF-ICTVCR-KS----------'
    initial = re.sub('-', '', st)
    for i, char in enumerate(initial):
        print i, char, st[get_1_indexed(st, i)]

def get_1_indexed(st, char_index):
    return 1 + get(st, char_index - 1)

def test_1_indexed():
    st = '---------------LL---NE--HVKTHTEEK---PF-ICTVCR-KS----------'
    initial = re.sub('-', '', st)
    for i, char in enumerate(initial):
        print i+1, char, st[get_1_indexed(st, i + 1) - 1]

Answer 2

我的原始文字看起来像这样，我感兴趣的位置是12，即'E'

实际上，它是K，假设您使用零索引字符串。 Python使用零索引，所以除非你跳到单索引的东西（你不是），它会给你K.如果你遇到问题，试着解决这个问题。

这里有一些代码可以满足你的需要（尽管有0索引，而不是1索引）。这可以在网上找到here：

def get_new_index(oldindex, str):
    newindex = 0

    for c in str:
        if c != '-':
            if oldindex == 0:
                return newindex
            oldindex -= 1
        newindex += 1

    return 1 / 0 # throw a shitfit if we don't find the index

Answer 3

这是获得第二行的愚蠢方式，使用islice更清楚，或next(f)

for line, line2 in itertools.izip_longest(f, f, fillvalue=''):

此处count_letter似乎是int而tokenizer是str。可能不是你所期望的。

    for countletter <= tokenizer:

这也是一个语法错误，所以我认为这不是你正在运行的代码

也许你应该

tokenizer = int(line.split()[4])

将tokenizer变为int

print tokenizer可能会产生误导，因为int和str看起来相同，所以您会看到您希望看到的内容。在调试时尝试print repr(tokenizer)。

一旦确定tokenizer是int，就可以更改此行

    for i,character in enumerate(line2[:tokenizer]):

如何逐行处理字符

3 个答案: