Question

我有一个非常大的文本文件，据说来自2个GPS天线的纬度测量值。文件中有很多垃圾数据，我需要从中提取纬度测量值。这些总是偶尔出现在其他文本的其他行之间。它们出现的界限如下：

12:34:56.789    78:90:12.123123123  BLAH_BLAH   blahblah    :      LAT #1 MEAS=-80[deg], LAT #2 MEAS=-110[deg]  blah_BlHaBKBjFkjsa.c

我需要的数字是＆＃34; LAT #1 MEAS=-80[deg]＆＃34;之间的数字。和＆＃34; LAT #2 MEAS=-110[deg]＆＃34;。所以，基本上是-80和-110。

剩下的文字对我来说并不重要。

以下是输入文件中的示例文本：

08:59:07.603    08:59:05.798816 PAL_PARR_INTF   TraceModule GET int@HISR :82    drv_Shm.c (../../../PALCommon/Platform_EV/HAL/Common/driver/Shm/src)    525 
08:59:07.603    08:59:05.798816 PAL_PARR_INTF   TraceModule xdma is not running drv_Shm.c (../../../PALCommon/Platform_EV/HAL/Common/driver/Shm/src)    316 
08:59:07.603    08:59:05.798847 PAL_PARR_INTF   TraceModule DMA is activated    drv_Shm.c (../../../PALCommon/Platform_EV/HAL/Common/driver/Shm/src)    461 
08:59:10.847    08:59:09.588001 UHAL_SRCH   TraceFlow   :      LAT #1 MEAS=-80[deg], LAT #2 MEAS=-110[deg]  uhal_CHmcpPschMultiPath.c (../../../HEDGE/UL1/UHAL_3XX/Searcher/Code/Src)   1596    
08:59:11.440    08:59:10.876819 UHAL_COMMON TraceWarning    cellRtgSlot=0 cellRtgChip=1500 CELLK_ACTIVE=1 boundary RSN 232482 current RSN 232482 boundarySFN 508 currentSFN 508 uhal_Hmcp.c (../../../HEDGE/UL1/UHAL_3XX/platform/Code/Src) 2224    
08:59:11.440    08:59:10.877277 UHAL_SRCH   TraceWarning    uhal_HmcpSearcherS1LISR: status_reg(0xf0100000) uhal_CHmcpPschMultiPath.c (../../../HEDGE/UL1/UHAL_3XX/Searcher/Code/Src)   1497    
08:59:11.440    08:59:10.877307 UHAL_COMMON TraceWarning    uhal_HmcpSearcherSCDLISR is called. uhal_CHmcpPschMultiPath.c (../../../HEDGE/UL1/UHAL_3XX/Searcher/Code/Src)   1512    
08:59:11.440    08:59:10.877338 UHAL_SRCH   TraceFlow   :      LAT #1 MEAS=-78[deg], LAT #2 MEAS=-110[deg]  uhal_CHmcpPschMultiPath.c (../../../HEDGE/UL1/UHAL_3XX/Searcher/Code/Src)   1596

现在，我正在使用代码打开文件并获取这些值，但它不起作用。我是编程新手，所以我不知道我在哪里出错了。

import re                                                                       

    # Importing 're' for using regular expressions

file_dir=raw_input('Enter the complete Directory of the file (eg c:\\abc.txt):')    # Providing the user with a choice to open their file in .txt format
with open(file_dir, 'r') as f:
    lat_lines= f.read()                                                            # storing the data in a variable

# Declaring the two lists to hold the numbers
raw_lat1 = []
raw_lat2 = []

start_1 = 'LAT #1 MEAS='
end_1 = '[de'

start_2 = 'LAT #2 MEAS='
end_2 = '[de'

x = re.findall(r'start_1(.*?)end_1',lat_lines,re.DOTALL)
raw_lat1.append(x)

y = re.findall(r'start_2(.*?)end_2',lat_lines,re.DOTALL)
raw_lat2.append(y)

Answer 1

这应该这样做（它不使用正则表达式，但它仍然可以工作）

answer = []
with open('file.txt') as infile:
    for line in infile:
        if "LAT #1 MEAS=" not in line: continue
        if "LAT #2 MEAS=" not in line: continue
        splits = line.split('=')
        temp = [0,0]
        for i,part in enumerate(splits):
            if part.endswith("LAT #1 MEAS"): temp[0] = int(splits[i+1].split(None,1)[0].split('[',1)[0])
            elif part.endswith("LAT #2 MEAS"): temp[1] = int(splits[i+1].split(None,1)[0].split('[',1)[0])
        answer.append(temp)

Answer 2

我可以从这里看到正则表达式的几个问题。在def politeGreeting(name): #if the user's name is Lewis or Clark, say "Oh, it's you." if name == "Lewis" or "Clark": return("Oh, it's you") #if the user's name is anything else else: return("Hello," + name + "!") name = input("please enter your name") print (politeGreeting(name))来电中，您使用的是Oh, it's you和re.findall，就好像它们是变量一样，但正则表达式实际上只会将它们视为原始字符start_1和end_2等。要使用正则表达式字符串中的变量，您必须使用格式字符串。例如：

"start_1"

此外，当您使用"end_1"时，这将匹配任何字符，因此它将匹配所有字符，直到该行r'%s(.*?)%s' % (start_1, end_1)的最终出现。 .*end_1和end_1都以相同的方式结束，所以如果其他一切都是正确的，那么这实际上会匹配`“ - 80 [deg]，LAT＃2 MEAS = -110 [de”< / p>

此外，在正则表达式中使用括号时，必须将它们转义。文字括号用于指定正则表达式中的字符集。

以下是我假设变量LAT #1包含示例字符串LAT #2的示例。您可能需要为整个文件调整此代码段。

line

如何从python中的文本文件的多行中提取两个特定的数字

2 个答案: