我有一个像这样的TEXT文件 -
sfdfd
kgfkhgjk
fsdfs
sgsgggggfsdf
Node: RBS6301 CXP102051/26_R30F L17A.4-6 (C17.0_LSV198_PA24)
=================================
col1 clo2 clo3
=================================
1 avb wer21g2
---------------------------------
=================================
empcode Emnname Date DESC
12d sf 2018-02-06 dghsjf hfhgf jfjh
asf2 asdfw2 2018-02-16 fsfsfg jhjhhjghk
dsf21 sdf2 2016-02-06 sdgfsgf
sdgg dsds dkfd-sffddfdf aaaa
dfd gfg dfsdffd aaaa
df dfdf efefkhgvkjgjk kgkjjk
4fr freff klhlkkl
-----------------------------------
hfjh
vkgjlbljkbkjbk/n/l jhfjhfhj kutiugjm iugiuk
hfhj
fggggggggggggggggggggggg
从上面我使用 -
提取了以下部分import pandas as pd
import csv
findStr = 'empcode Emnname'
EndStr = '-----------------------------------'
tmp1 = []
tmp = []
tmp2=[]
with open('test123.txt') as f:
out = []
for line in f:
if line.startswith(findStr):
tmp.append(re.findall('\w+',line.strip()))
for line in f:
if line.rstrip()==EndStr:
out.append(tmp)
break
tmp.append(re.sub('\s',' ',line.strip()))
f.close()
TMP O / P -
[['empcode', 'Emnname', 'Date', 'DESC'],
'12d sf 2018-02-06 dghsjf hfhgf jfjh',
'asf2 asdfw2 2018-02-16 fsfsfg jhjhhjghk',
'dsf21 sdf2 2016-02-06 sdgfsgf',
'sdgg dsds dkfd-sffddfdf aaaa',
'dfd gfg dfsdffd aaaa',
'df dfdf efefkhgvkjgjk kgkjjk',
'4fr freff klhlkkl']
但是,我想要空白区域中的NA。在 gfg 之下或 4fr 之后。任何人都可以帮忙。它应该像 -
[['empcode', 'Emnname', 'Date', 'DESC'],
'12d sf 2018-02-06 dghsjf hfhgf jfjh',
'asf2 asdfw2 2018-02-16 fsfsfg jhjhhjghk',
'dsf21 sdf2 2016-02-06 sdgfsgf',
'sdgg dsds dkfd-sffddfdf aaaa',
'dfd gfg dfsdffd aaaa',
'df NA dfdf efefkhgvkjgjk kgkjjk',
'4fr NA NA freff klhlkkl']
答案 0 :(得分:0)
使用return (*env)NewString(env, carr, length);
提取您要查找的部分,并利用Pandas re
固定宽度阅读器。
read_fwf
如果由于某种原因OP实际上想要列表输出
import re
import pandas as pd
pat = '(empcode Emnname(.|\n)*)-----------------------------------'
txt = re.findall(pat, open('test123.txt').read())[0][0]
h, b = txt.split('\n', 1)
df = pd.read_fwf(pd.io.common.StringIO(b), header=None, names=h.split())
df
empcode Emnname Date DESC
0 12d sf 2018-02-06 dghsjf hfhgf jfjh
1 asf2 asdfw2 2018-02-16 fsfsfg jhjhhjghk
2 dsf21 sdf2 2016-02-06 sdgfsgf
3 sdgg dsds dkfd-sffddfdf aaaa
4 dfd gfg dfsdffd aaaa
5 df NaN dfdf efefkhgvkjgjk kgkjjk
6 4fr NaN NaN freff klhlkkl