Question

我对Python中的文件r / w几乎没有任何经验，并且想知道针对我的特定情况的最佳解决方案是什么。

我有一个带有以下结构的制表符分隔文件，其中每个句子用空行分隔：

Roundup NN
:   :
Muslim  NNP
Brotherhood NNP
vows    VBZ
daily   JJ
protests    NNS
in  IN
Egypt   NNP

Families    NNS
with    IN
no  DT
information NN
on  IN
the DT
whereabouts NN
of  IN
loved   VBN
ones    NNS
are VBP
grief   JJ
-   :
stricken    JJ
.   .

The DT
provincial  JJ
departments NNS
of  IN
supervision NN
and CC
environmental   JJ
protection  NN
jointly RB
announced   VBN
on  IN
May NNP
9   CD
that    IN
the DT
supervisory JJ
department  NN
will    MD
question    VB
and CC
criticize   VB
mayors  NNS
who WP
fail    VBP
to  TO
curb    VB
pollution   NN
.   .

(...)

我想追加到这个文件的非空行，首先是一个制表符，然后是一个给定的字符串。

对于每一行，要追加的字符串将取决于下面代码中lab_pred_tags中存储的值。对于for循环的每次迭代，lab_pred_tags与文本文件中对应句子的行数长度相同。即，在该示例中，3 lab_pred_tags循环迭代的for长度为9,15和12。

对于第一次for循环迭代，lab_pred_tags包含list：['O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O', 'O', 'B-GPE']

# (...) code to calculate lab_pred
for lab, lab_pred, length in zip(labels, labels_pred, sequence_lengths):
    lab = lab[:length]
    lab_pred = lab_pred[:length]
    # Convert lab_pred from a sequence of numbers to a sequence of strings
    lab_pred_tags = d_u.label_idxs_to_tags(lab_pred, tags)
    # Now what is the best solution to append each element of `lab_pred_tags` to each line in the file?
    # Keep in mind that I will need to skip a line everytime a new for loop iteration is started

例如，所需的输出文件是：

Roundup NN  O
:   :   O
Muslim  NNP B-ORG
Brotherhood NNP I-ORG
vows    VBZ O
daily   JJ  O
protests    NNS O
in  IN  O
Egypt   NNP B-GPE

Families    NNS O
with    IN  O
no  DT  O
information NN  O
on  IN  O
the DT  O
whereabouts NN  O
of  IN  O
loved   VBN O
ones    NNS O
are VBP O
grief   JJ  O
-   :   O
stricken    JJ  O
.   .   O

The DT  O
provincial  JJ  O
departments NNS O
of  IN  O
supervision NN  O
and CC  O
environmental   JJ  O
protection  NN  O
jointly RB  O
announced   VBN O
on  IN  O
May NNP O
9   CD  O
that    IN  O
the DT  O
supervisory JJ  O
department  NN  O
will    MD  O
question    VB  O
and CC  O
criticize   VB  O
mayors  NNS O
who WP  O
fail    VBP O
to  TO  O
curb    VB  O
pollution   NN  O
.   .   O

最佳解决方案是什么？

Answer 1

出于测试目的，我修改了lab_pred_tags列表。这是我的解决方案：

    lab_pred_tags = ['O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O', 'O',
                     'B-GPE', 'O', 'O', 'B-ORG', 'I-ORG', 'O', 'O', 'O',
                     'O', 'B-GPE', 'O', 'O', 'B-ORG', 'I-ORG', 'O', 'O',
                     'O', 'O', 'B-GPE', 'O']
    index = 0

    with open("PATH_TO_YOUR_FILE", "r") as lab_file, \
            open("PATH_TO_NEW_FILE", "w") as lab_file_2:
        lab_file_list = lab_file.readlines()

        for lab_file_list_element in lab_file_list:
            if lab_file_list_element != "\n":
                new_line_element = lab_file_list_element.replace(
                    "\n", ' ' + lab_pred_tags[index] + "\n"
                )
                index += 1
                lab_file_2.write(new_line_element)
            if lab_file_list_element == "\n":
                index = 0
                lab_file_2.write("\n")

使用Python将变量字符串附加到文件中的每一行

1 个答案: