Question

您好我有3个数据帧我已经包含了我的数据帧的虚拟版本

信息：有4列WF，是信息来自的文件，wn不重要，wno是边界和分割所指向的数字

WF  wn  wno w
a   unimportant 0   hi
a   unimportant 1   i
a   unimportant 2   like 
a   unimportant 3   chicken
a   unimportant 4   the
a   unimportant 5   world
a   unimportant 7   is 
a   unimportant 9   round
b   unimportant 0   chicken 
b   unimportant 1   is
b   unimportant 2   flat
b   unimportant 3   earthers 
b   unimportant 4   incomprehensible
b   unimportant 6    best
b   unimportant 7   insandwiches
c   unimportant 0   beef
c   unimportant 1   also
c   unimportant 2   good
c   unimportant 3   how
c   unimportant 4   to
c   unimportant 5   explain
c   unimportant 6   night
c   unimportant 7   and
c   unimportant 8   day

BOUNDRYS：fn是信息来自的文件，f指向信息中的WF，信息中的开始/停止指向wno，标签不重要，topicID用作边界

    fn  f   start   stop    tag     ID
A.a a   0   3   d   1
A.a b   0   1   d   1
A.a nan nan nan c   1
A.a c   0   2   d   2
A.a nan nan nan c   2
B.b a   4   9   d   1
B.b b   2   4   d   1
B.b nan nan nan c   1
B.b c   3   8   d   2
B.b nan nan nan c   2

SPLITS：与boundry相同，用于定义信息文件中的边界，将在文件中引入a / n换行符。

   fn   f   start   stop    tag ID
A.A.a   a   0   3   d   nan
A.A.a   a   4   9   d   nan
B.B.b   b   0   1   d   nan
B.B.b   b   2   4   d   nan
B.B.b   b   5   7   d   nan
C.C.c   c   0   2   d   nan
C.C.c   c   3   8   d   nan

我想将这些文件与文件名组合成多个文本文件无论边界文件是什么，即A.a.txt

如果从这个示例中获得2个文件

，我想要实现的目标

A.a.txt 输出和格式：

hi i like chicken
chicken is
****
beef also good

（并继续这样）

B.b.txt

the world is round
flat earthers incomprehensible
****
how to explain night and day

这需要可扩展到数百个文件我绝对难过，并且会喜欢你可能有的任何建议。

我试过这个

v = TW.set_index('wno')['w']

sent = [
    ' '.join(v.loc[i:j]) for i, j in zip(topics['start'], topics['stop'])
]

sent

然而我得到了一个nan错误，我怀疑这只会给我一个长列表，我需要根据边界分解它，A.a将是1个文件B.b将是一个单独的文件

将来自多个pandas数据帧的数据和输出组合到多个文本文件

0 个答案: