将表读入pandas中的数据帧

时间:2018-02-05 06:23:15

标签: python pandas dataframe

我有一个包含表格的文件(tbl扩展名)。其内容如下:

Gibberish Gibberish Gibberish 
{Group}
Name = 'Messi'
Height = 170 cm
Weight = 72 kg
{End Group}
{Group}
Name = 'Ronaldo'
Height = 187 cm
Weight = 84 kg
{End Group}

如何将其读取到pandas数据帧?我想将此与另一个文件合并。 我希望输出类似于:

      height   weight
messi   170      72
ronaldo 187      84

我调查了大熊猫read_table,但无济于事。

感谢任何帮助。

2 个答案:

答案 0 :(得分:2)

我写了一个函数来概括

import pandas as pd
import re


def read_custom_table(filename,
                      rec_st_lim='{',
                      rec_end_lim='}',
                      k_v_sep=':',
                      item_sep=',',
                      index_col=None):
    """
    This function takes a text file name as input,
    read the text and extracts records
    and returns a pandas dataframe
    Inputs
    ---------------
    filename:  string containing system file name

    rec_st_lim: string of varied length(1+) marking the start of
    a single record

    rec_end_lim: string of varied length(1+) marking the end of
    a single record

    k_v_sep: key-value seperator within a an arbitray record.

    item_sep: item seperator, seperates key/value pairs

    index_col: the name of the column to use as index, default =None
    i.e. index is a numerical range
    ----------------
    Output: df, a dataframe with columns = the keys in an arbitrary
    record and index = index_col when index_col is not None

   """

    pattern = r"{}(.*?){}".format(rec_st_lim, rec_end_lim)

    with open(filename) as f:
        df = pd.DataFrame(
            list
            (map
             (lambda rec:
              dict([(el.strip() for el in r.split(k_v_sep))
                    for r in rec.split(item_sep) if len(r) > 1]),
              re.findall(pattern, f.read(), re.DOTALL)
              )
             )
        )
        f.close()
    if index_col:
        df.set_index(index_col, inplace=True)
    return df

该函数可用于OP示例中的数据,如下所示

df = read_custom_table('debug.txt',
                                                 rec_st_lim='\{Group\}',
                                                 rec_end_lim='\{End Group\}',
                                                 k_v_sep='=',
                                                 item_sep='\n',
                                                 index_col='Name')
print(df)

输出

           Height Weight
Name                    
'Messi'    170 cm  72 kg
'Ronaldo'  187 cm  84 kg

答案 1 :(得分:0)

实现目标的一种方法是执行字符串制作并将数据转换为字典列表,然后将其转换为数据帧。

示例:

import pandas as pd

stringVal = ''
with open("Path to inputfile", "r") as infile:   #I have the data that you have posted in you question as content in input file
    for i in infile.readlines():
        if i.startswith("Name"):
            stringVal += (i+"|").replace("\n", "").replace("'", "")
        if i.startswith("Height"):
            stringVal += (i+"|").replace("\n", "")
        if i.startswith("Weight"):
            stringVal += i+"\n" 

res = []    
for i in stringVal.strip().split("\n"):
    if i:
        d = {}
        for j in i.split("|"):
            val = j.split("=")
            d[val[0].strip()] = val[1].strip()
        res.append(d)

df = pd.DataFrame(res)
df = df.set_index('Name') 
print df

<强>输出:

         Height Weight
Name                  
Messi    170 cm  72 kg
Ronaldo  187 cm  84 kg