我有一个包含表格的文件(tbl扩展名)。其内容如下:
Gibberish Gibberish Gibberish
{Group}
Name = 'Messi'
Height = 170 cm
Weight = 72 kg
{End Group}
{Group}
Name = 'Ronaldo'
Height = 187 cm
Weight = 84 kg
{End Group}
如何将其读取到pandas数据帧?我想将此与另一个文件合并。 我希望输出类似于:
height weight
messi 170 72
ronaldo 187 84
我调查了大熊猫read_table
,但无济于事。
感谢任何帮助。
答案 0 :(得分:2)
我写了一个函数来概括
import pandas as pd
import re
def read_custom_table(filename,
rec_st_lim='{',
rec_end_lim='}',
k_v_sep=':',
item_sep=',',
index_col=None):
"""
This function takes a text file name as input,
read the text and extracts records
and returns a pandas dataframe
Inputs
---------------
filename: string containing system file name
rec_st_lim: string of varied length(1+) marking the start of
a single record
rec_end_lim: string of varied length(1+) marking the end of
a single record
k_v_sep: key-value seperator within a an arbitray record.
item_sep: item seperator, seperates key/value pairs
index_col: the name of the column to use as index, default =None
i.e. index is a numerical range
----------------
Output: df, a dataframe with columns = the keys in an arbitrary
record and index = index_col when index_col is not None
"""
pattern = r"{}(.*?){}".format(rec_st_lim, rec_end_lim)
with open(filename) as f:
df = pd.DataFrame(
list
(map
(lambda rec:
dict([(el.strip() for el in r.split(k_v_sep))
for r in rec.split(item_sep) if len(r) > 1]),
re.findall(pattern, f.read(), re.DOTALL)
)
)
)
f.close()
if index_col:
df.set_index(index_col, inplace=True)
return df
该函数可用于OP示例中的数据,如下所示
df = read_custom_table('debug.txt',
rec_st_lim='\{Group\}',
rec_end_lim='\{End Group\}',
k_v_sep='=',
item_sep='\n',
index_col='Name')
print(df)
输出
Height Weight
Name
'Messi' 170 cm 72 kg
'Ronaldo' 187 cm 84 kg
答案 1 :(得分:0)
实现目标的一种方法是执行字符串制作并将数据转换为字典列表,然后将其转换为数据帧。
示例:强>
import pandas as pd
stringVal = ''
with open("Path to inputfile", "r") as infile: #I have the data that you have posted in you question as content in input file
for i in infile.readlines():
if i.startswith("Name"):
stringVal += (i+"|").replace("\n", "").replace("'", "")
if i.startswith("Height"):
stringVal += (i+"|").replace("\n", "")
if i.startswith("Weight"):
stringVal += i+"\n"
res = []
for i in stringVal.strip().split("\n"):
if i:
d = {}
for j in i.split("|"):
val = j.split("=")
d[val[0].strip()] = val[1].strip()
res.append(d)
df = pd.DataFrame(res)
df = df.set_index('Name')
print df
<强>输出:强>
Height Weight
Name
Messi 170 cm 72 kg
Ronaldo 187 cm 84 kg