您好我有3个数据帧 我已经包含了我的数据帧的虚拟版本
信息:有4列WF,是信息来自的文件,wn不重要,wno是边界和分割所指向的数字
WF wn wno w
a unimportant 0 hi
a unimportant 1 i
a unimportant 2 like
a unimportant 3 chicken
a unimportant 4 the
a unimportant 5 world
a unimportant 7 is
a unimportant 9 round
b unimportant 0 chicken
b unimportant 1 is
b unimportant 2 flat
b unimportant 3 earthers
b unimportant 4 incomprehensible
b unimportant 6 best
b unimportant 7 insandwiches
c unimportant 0 beef
c unimportant 1 also
c unimportant 2 good
c unimportant 3 how
c unimportant 4 to
c unimportant 5 explain
c unimportant 6 night
c unimportant 7 and
c unimportant 8 day
BOUNDRYS:fn是信息来自的文件,f指向信息中的WF,信息中的开始/停止指向wno,标签不重要,topicID用作边界
fn f start stop tag ID
A.a a 0 3 d 1
A.a b 0 1 d 1
A.a nan nan nan c 1
A.a c 0 2 d 2
A.a nan nan nan c 2
B.b a 4 9 d 1
B.b b 2 4 d 1
B.b nan nan nan c 1
B.b c 3 8 d 2
B.b nan nan nan c 2
SPLITS:与boundry相同,用于定义信息文件中的边界,将在文件中引入a / n换行符。
fn f start stop tag ID
A.A.a a 0 3 d nan
A.A.a a 4 9 d nan
B.B.b b 0 1 d nan
B.B.b b 2 4 d nan
B.B.b b 5 7 d nan
C.C.c c 0 2 d nan
C.C.c c 3 8 d nan
我想将这些文件与文件名组合成多个文本文件 无论边界文件是什么,即A.a.txt
如果从这个示例中获得2个文件
,我想要实现的目标A.a.txt 输出和格式:
hi i like chicken
chicken is
****
beef also good
(并继续这样)
B.b.txt
the world is round
flat earthers incomprehensible
****
how to explain night and day
这需要可扩展到数百个文件 我绝对难过,并且会喜欢你可能有的任何建议。
我试过这个
v = TW.set_index('wno')['w']
sent = [
' '.join(v.loc[i:j]) for i, j in zip(topics['start'], topics['stop'])
]
sent
然而我得到了一个nan错误,我怀疑这只会给我一个长列表,我需要根据边界分解它,A.a将是1个文件B.b将是一个单独的文件