我希望使用Python遍历文件并从两个特定的数据列中提取数据,示例数据如下。
----------------------------------
Local Cell ID Cell Name Physical cell ID Additional spectrum emission Cell active state Cell admin state Cell middle block timer(min) Cell FDD TDD indication Subframe assignment Special subframe patterns
11 12345678912345678912345678912 427 1 Active Unblock NULL TDD SA2 SSP6
12 12345678912345678912345678912 130 1 Active Unblock NULL TDD SA2 SSP6
14 12345678912345678912345678912 94 1 Active Unblock NULL TDD SA2 SSP6
15 12345678912345678912345678912 37 1 Active Unblock NULL TDD SA2 SSP6
21 12345678912345678912345678912 188 1 Active Unblock NULL TDD SA2 SSP6
22 12345678912345678912345678912 203 1 Active Unblock NULL TDD SA2 SSP6
24 12345678912345678912345678912 209 1 Active Unblock NULL TDD SA2 SSP6
25 12345678912345678912345678912 230 1 Active Unblock NULL TDD SA2 SSP6
(Number of results = 8)
--- END
我已经使用以下脚本将每一行拉入一个特定值,但是我想知道是否有可能仅将“ Cell Name”和“ Physical Cell ID”下的数据拉至第4行的12345678912345678912345678912和427。
signal = open('signal.txt', 'r')
newFile = open('results2.txt', 'w')
for line in signal:
if 'False' in line:
print('.', end="")
newFile.write(line)
else:
print(" ", end="")
newFile.close()
signal.close()
print('Done')
答案 0 :(得分:0)
@ J.Byrne,另一种方法是使用pandas
数据框read_csv
提取数据(忽略第1行和底部数据,添加列名),然后选择列您感兴趣的。
查看此代码以提取:
import pandas as pd
df=pd.read_csv('signal.txt', skiprows=2,skipfooter=4, sep='\s+',
names=[
'Local Cell ID',
'Cell Name',
'Physical cell ID',
'Additional spectrum emission',
'Cell active state',
'Cell admin state',
'Cell middle block timer(min)',
'Cell FDD TDD indication',
'Subframe assignment',
'Special subframe patterns'],
engine='python')
df
结果在这里:
Local Cell ID Cell Name Physical cell ID Additional spectrum emission Cell active state Cell admin state Cell middle block timer(min) Cell FDD TDD indication Subframe assignment Special subframe patterns
0 11 12345678912345678912345678912 427 1 Active Unblock NaN TDD SA2 SSP6
1 12 12345678912345678912345678912 130 1 Active Unblock NaN TDD SA2 SSP6
2 14 12345678912345678912345678912 94 1 Active Unblock NaN TDD SA2 SSP6
3 15 12345678912345678912345678912 37 1 Active Unblock NaN TDD SA2 SSP6
4 21 12345678912345678912345678912 188 1 Active Unblock NaN TDD SA2 SSP6
5 22 12345678912345678912345678912 203 1 Active Unblock NaN TDD SA2 SSP6
6 24 12345678912345678912345678912 209 1 Active Unblock NaN TDD SA2 SSP6
7 25 12345678912345678912345678912 230 1 Active Unblock NaN TDD SA2 SSP6
使用此过滤器:
df[["Cell Name","Physical cell ID"]]
结果在这里:
Cell Name Physical cell ID
0 12345678912345678912345678912 427
1 12345678912345678912345678912 130
2 12345678912345678912345678912 94
3 12345678912345678912345678912 37
4 12345678912345678912345678912 188
5 12345678912345678912345678912 203
6 12345678912345678912345678912 209
7 12345678912345678912345678912 230
答案 1 :(得分:0)
请参阅下面的另一种方法。您可以遍历txt文件signal.txt
中的各行,然后调用搜索功能以获取CellName
或PhysicalCellID
。
import re
import pandas as pd
mydicts = []
def FindCellName(line):#create a function looking at each line
CellName=None #empty the variable
j=re.findall('\d{29}', line) #find string with 29 characters
if len(j)>0:
CellName=j[0] #if it exists assign it to CellName
return CellName
def FindPhysicalCellID(line):#create a function looking at each line
PhysicalCellID=None #empty the variable
res= re.search('\d{29}(.*) 1', line) #find string after the 29 characters and before the 1
if res:
PhysicalCellID=res.group(1) #if it exists assign it to PhysicalCellID
return PhysicalCellID
with open('signal.txt') as topo_file:
for line in topo_file:
if FindCellName(line) : #if CellName exists
mydicts.append((FindCellName(line), FindPhysicalCellID(line))) # append CellName and PhysicalCellID in the diction
df=pd.DataFrame(mydicts, columns=('CellName', 'PhysicalCellID'))
df
结果如下:
CellName PhysicalCellID
0 12345678912345678912345678912 427
1 12345678912345678912345678912 130
2 12345678912345678912345678912 94
3 12345678912345678912345678912 37
4 12345678912345678912345678912 188
5 12345678912345678912345678912 203
6 12345678912345678912345678912 209
7 12345678912345678912345678912 230