我之前曾问过如何使用pandas输入此.txt文件的问题。 我正在尝试使用pandas.read_csv
我发现除非我删除标题数据(直到"#"),否则我无法使用read_csv读取此文件。
问题是,我需要从标题数据中提取数据,如Well Name,Well KB,Well Type ....有没有办法用Pandas做到这一点? 或者我只需要以其他方式输入它吗?
我原来的问题在这里:
Pandas.read_csv error tokenizing data
原始文本文件:
# WELL TRACE FROM PETREL
# WELL NAME: ZZ-0113
# WELL HEAD X-COORDINATE: 9999999.00000000 (m)
# WELL HEAD Y-COORDINATE: 9999999.00000000 (m)
# WELL KB: 159.00000000 (ft)
# WELL TYPE: OIL
# MD AND TVD ARE REFERENCED (=0) AT KB AND INCREASE DOWNWARDS
# ANGLES ARE GIVEN IN DEGREES
# XYZ TRACE IS GIVEN IN COORDINATE SYSTEM WGS_1924_UTM_Zone_42N
# AZIMUTH REFERENCE TRUE NORTH
# DX DY ARE GIVEN IN GRID NORTH IN m-UNITS
# DEPTH (Z, TVD) GIVEN IN ft-UNITS
#======================================================================================================================================
MD X Y Z TVD DX DY AZIM INCL DLS
#======================================================================================================================================
0.0000000000 999999.00000 9999999.0000 159.00000000 0.0000000000 0.0000005192 -0.000000000 1.3487006929 0.0000000000 0.0000000000
132.00000000 999999.08032 9999999.9116 27.000774702 131.99922530 0.0803153923 -0.088388779 139.08870069 0.3400000000 0.2575757504
221.00000000 999999.19115 9999999.8017 -61.99775149 220.99775149 0.1911487882 -0.198290891 132.93870069 0.3200000000 0.0456726104
答案 0 :(得分:1)
您可以使用注释指示符作为分隔符来解析文件,然后使用pandas str.extract
from io import StringIO
import pandas as pd
txt = """# WELL TRACE FROM PETREL
# WELL NAME: ZZ-0113
# WELL HEAD X-COORDINATE: 9999999.00000000 (m)
# WELL HEAD Y-COORDINATE: 9999999.00000000 (m)
# WELL KB: 159.00000000 (ft)
# WELL TYPE: OIL
# MD AND TVD ARE REFERENCED (=0) AT KB AND INCREASE DOWNWARDS
# ANGLES ARE GIVEN IN DEGREES
# XYZ TRACE IS GIVEN IN COORDINATE SYSTEM WGS_1924_UTM_Zone_42N
# AZIMUTH REFERENCE TRUE NORTH
# DX DY ARE GIVEN IN GRID NORTH IN m-UNITS
# DEPTH (Z, TVD) GIVEN IN ft-UNITS
#======================================================================================================================================
MD X Y Z TVD DX DY AZIM INCL DLS
#======================================================================================================================================
0.0000000000 999999.00000 9999999.0000 159.00000000 0.0000000000 0.0000005192 -0.000000000 1.3487006929 0.0000000000 0.0000000000
132.00000000 999999.08032 9999999.9116 27.000774702 131.99922530 0.0803153923 -0.088388779 139.08870069 0.3400000000 0.2575757504
221.00000000 999999.19115 9999999.8017 -61.99775149 220.99775149 0.1911487882 -0.198290891 132.93870069 0.3200000000 0.0456726104"""
header_parse = pd.read_csv(StringIO(txt), sep='#', skipinitialspace=True, header=None)
hd = header_parse.iloc[:, 1].dropna()
hd.str.extract('\s*(?P<key>[^:]+)\s*:\s*(?P<value>.+)', expand=True).dropna()
key value
1 WELL NAME ZZ-0113
2 WELL HEAD X-COORDINATE 9999999.00000000 (m)
3 WELL HEAD Y-COORDINATE 9999999.00000000 (m)
4 WELL KB 159.00000000 (ft)
5 WELL TYPE OIL
获取其余数据
df = pd.read_csv(StringIO(txt), comment='#', delim_whitespace=True)
df
答案 1 :(得分:0)
很好。我会这样做的。但我很高兴知道我是否可以在熊猫中完成所有工作。
fh = open(filePath)
lst =[]
for line in fh:
if line.startswith('# WELL NAME:'):
line.rstrip()
lst = line.split()
wellName = lst[3]
print (wellname)
#print (lst)
if line.startswith('# WELL KB'):
line.rstrip()
lst = line.split()
kb = lst[3]
print (kb)
#print (lst)
if line.startswith('# WELL TYPE'):
line.rstrip()
lst = line.split()
wellType = lst[3]
print (wellType)
#print (lst)
continue