字符串排序csv行

时间:2017-07-15 15:42:40

标签: python string csv pandas

import pandas as pd

rawDF = pd.read_csv('D:\Project\python\Grade\GradeDataRaw.csv',names=['GradeCol'])

filteredDF = rawDF[rawDF['GradeCol'].str.contains('EVCS:|BVCS:|LOW POINT STA')]
print(filteredDF)

filename = 'GradeOut.csv'

filteredDF.to_csv(filename,index=False, encoding='utf-8')

CSV文件中的输出为

GradeCol

EVCS: 210+080.907

BVCS: 210+080.907

LOW POINT STA =208+108.133\PLOW POINT ELEV = 66.849\PPVI STA = 209+126.315\PPVI ELEV = 66.762\PA.D = 1.413%\PK

LOW POINT STA =208+108.133\PLOW POINT ELEV = 66.849\PPVI STA = 209+126.000\PPVI ELEV = 66.762\PA.D = 1.413%\PK

想要在此字符串可用的数据框行中只有“PPVI STA = 209 + 126.315”,其他行包含EVCS& BVCS保持完整,数字部分可以在每一行中变化。 使用extract方法在不匹配的行中获取NaN值,这不是意图。

2 个答案:

答案 0 :(得分:1)

IIUC:

样本DF:

In [99]: df
Out[99]:
                                                 txt
0         info \GPK HEK = 209+126.315\info ends here
1  blah-blah-blah GPK HEK = 1 + 2.33333end of string

解决方案:

In [100]: df['txt'].str.extract(r'(GPK HEK\s*=\s*\d+\s*\+\s*\d+\.\d+)', expand=False)
Out[100]:
0    GPK HEK = 209+126.315
1    GPK HEK = 1 + 2.33333
Name: txt, dtype: object

答案 1 :(得分:0)

这应该可以胜任。

def parse(string):
    start = string.find('\\') + 1
    end   = string.find('.')

    while string[end] != '\\':
        end += 1

    return string[start : end]