import pandas as pd
rawDF = pd.read_csv('D:\Project\python\Grade\GradeDataRaw.csv',names=['GradeCol'])
filteredDF = rawDF[rawDF['GradeCol'].str.contains('EVCS:|BVCS:|LOW POINT STA')]
print(filteredDF)
filename = 'GradeOut.csv'
filteredDF.to_csv(filename,index=False, encoding='utf-8')
CSV文件中的输出为
GradeCol
EVCS: 210+080.907
BVCS: 210+080.907
LOW POINT STA =208+108.133\PLOW POINT ELEV = 66.849\PPVI STA = 209+126.315\PPVI ELEV = 66.762\PA.D = 1.413%\PK
LOW POINT STA =208+108.133\PLOW POINT ELEV = 66.849\PPVI STA = 209+126.000\PPVI ELEV = 66.762\PA.D = 1.413%\PK
想要在此字符串可用的数据框行中只有“PPVI STA = 209 + 126.315”,其他行包含EVCS& BVCS保持完整,数字部分可以在每一行中变化。 使用extract方法在不匹配的行中获取NaN值,这不是意图。
答案 0 :(得分:1)
IIUC:
样本DF:
In [99]: df
Out[99]:
txt
0 info \GPK HEK = 209+126.315\info ends here
1 blah-blah-blah GPK HEK = 1 + 2.33333end of string
解决方案:
In [100]: df['txt'].str.extract(r'(GPK HEK\s*=\s*\d+\s*\+\s*\d+\.\d+)', expand=False)
Out[100]:
0 GPK HEK = 209+126.315
1 GPK HEK = 1 + 2.33333
Name: txt, dtype: object
答案 1 :(得分:0)
这应该可以胜任。
def parse(string):
start = string.find('\\') + 1
end = string.find('.')
while string[end] != '\\':
end += 1
return string[start : end]