pandas dataframe为latex或html table nbconvert

时间:2013-12-19 15:34:42

标签: latex ipython ipython-notebook pdflatex

使用nbconvert to latex& amp;是否可以从ipython笔记本中的pandas数据框中获取格式良好的表格PDF?

默认似乎只是一个左对齐的数字块,看起来很伪劣。

我想更像是笔记本或乳胶表中数据框的html显示。保存和显示HTML渲染数据帧的.png图像也没问题,但究竟如何做到这一点已经证明是难以捉摸的。

最简单的说,我只想要一个简单的中心对齐表格。

我没有幸运尝试使用.to_latex()方法从pandas数据帧获取乳胶表,无论是在笔记本中还是在nbconvert输出中。我也尝试过(在阅读了ipython开发列表讨论之后,并遵循自定义显示逻辑笔记本示例)使用_repr_html_和_repr_latex_方法创建自定义类,分别返回_to_html()和_to_latex()的结果。我认为nb转换的一个主要问题是pdflatex对数据框to_latex()输出中的{'或//'不满意。但我不想在检查之前开始摆弄那个我没有错过的东西。

感谢。

3 个答案:

答案 0 :(得分:7)

Github issue中讨论了一种更简单的方法。基本上,您必须向DataFrame类添加_repr_latex_方法,该过程为documented from pandas in their official documentation

我是在这样的笔记本中做到的:

import pandas as pd

pd.set_option('display.notebook_repr_html', True)

def _repr_latex_(self):
    return "\centering{%s}" % self.to_latex()

pd.DataFrame._repr_latex_ = _repr_latex_  # monkey patch pandas DataFrame

以下代码:

d = {'one' : [1., 2., 3., 4.],
     'two' : [4., 3., 2., 1.]}
df = pd.DataFrame(d)
df
如果在笔记本中实时评估,

将变为HTML表格,并且它将转换为PDF格式的(居中)表格:

$ ipython nbconvert --to latex --post PDF notebook.ipynb

答案 1 :(得分:5)

我为此编写了自己的基于mako的模板方案。我认为,如果您承诺一次为自己做好准备,这实际上是一个非常简单的工作流程。之后,您开始看到模板化所需格式的元数据,因此可以将其从代码中分解出来(并不代表第三方依赖)是一种非常好的解决方法。

这是我提出的工作流程。

  1. 编写接受数据框作为参数的.mako模板(可能还有其他参数)并将其转换为您想要的TeX格式(例如下面的例子)。

  2. 创建一个包装类(我称之为to_tex),它创建了您想要的API(例如,您可以将数据对象传递给它,并在内部处理对mako渲染命令的调用)。

  3. 在包装类中,决定输出的方式。将TeX代码打印到屏幕上?使用子流程实际将其编译为pdf?

  4. 在我的情况下,我正在为研究论文生成初步结果,并且需要将表格式化为具有嵌套列名称等的复杂的双重排序结构。以下是其中一个表格的示例:< / p>

    Example output from templated TeX tool

    以下是mako模板(警告,粗略):

    <%page args="df, table_title, group_var, sort_var"/>
    <%
    """
    Template for country/industry two-panel double sorts TeX table.
    Inputs: 
    -------
    df: pandas DataFrame
        Must be 17 x 12 and have rows and columns that positionally
        correspond to the entries of the table.
    
    table_title: string
        String used for the title of the table.
    
    group_var: string
        String naming the grouping variable for the horizontal sorts.
        Should be 'Country' or 'Industry'.
    
    sort_var: string (raw)
        String naming the variable that is being sorted, e.g.
        "beta" or "ivol". Note that if you want the symbol to
        be rendered as a TeX symbol, then pass a raw Python
        string as the arg and include the needed TeX markup in
        the passed string. If the string isn't raw, some of the
        TeX markup might be interpreted as special characters.
    
    Returns:
    --------
    When used with mako.template.Template.render, will produce
    a raw TeX string that can be rendered into a PDF containing
    the specified data.
    
    Author:
    -------
    Ely M. Spears, 05/21/2013
    
    """
    # Python imports and helper function definitions.
    import numpy as np  
    def format_helper(x):
        return str(np.round(x,2))
    %>
    
    
    <%text>
    \documentclass[10pt]{article}
    \usepackage[top=1in, bottom=1in, left=1in, right=1in]{geometry}
    \usepackage{array}
    \newcolumntype{L}[1]{>{\raggedright\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
    \newcolumntype{C}[1]{>{\centering\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
    \setlength{\parskip}{1em}
    \setlength{\parindent}{0in}
    \renewcommand*\arraystretch{1.5}
    \author{Ely Spears}
    
    
    \begin{document}
    \begin{table} \caption{</%text>${table_title}<%text>}
    \begin{center}
        \begin{tabular}{ | p{2.5cm}  c c c c c p{1cm} c c c c c c p{1cm} |}
        \hline
        & \multicolumn{6}{c}{CAPM $\beta$} & \multicolumn{6}{c}{CAPM $\alpha$ (\%p.a.)} & \\
        \cline{2-7} \cline{9-14}
        & \multicolumn{6}{c}{</%text>${group_var}<%text> </%text>${sort_var}<%text> is:} & \multicolumn{6}{c}{</%text>${group_var}<%text> </%text>${sort_var}<%text> is:} & \\
        Stock </%text>${sort_var}<%text> is: & Low & 2 & 3 & 4 & High & Low - High & & Low & 2 & 3 & 4 & High & Low - High \\ 
        \hline
        \multicolumn{4}{|l}{Panel A. Point estimates} & & & & & & & & & & \\ 
        \hline
        Low            & </%text>${' & '.join(df.ix[0].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[0].map(format_helper).values[6:])}<%text> \\
        2              & </%text>${' & '.join(df.ix[1].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[1].map(format_helper).values[6:])}<%text> \\
        3              & </%text>${' & '.join(df.ix[2].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[2].map(format_helper).values[6:])}<%text> \\
        4              & </%text>${' & '.join(df.ix[3].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[3].map(format_helper).values[6:])}<%text> \\
        High           & </%text>${' & '.join(df.ix[4].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[4].map(format_helper).values[6:])}<%text> \\
        Low - High     & </%text>${' & '.join(df.ix[5].map(format_helper).values[0:5])}<%text> & & & </%text>${' & '.join(df.ix[5].map(format_helper).values[6:11])}<%text> & \\
    
    
        \multicolumn{6}{|l}{</%text>${group_var}<%text> effect (average of Low - High \underline{column})}     
            & </%text>${format_helper(df.ix[6,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[6,11])}<%text> \\
    
    
        \multicolumn{6}{|l}{Within-</%text>${group_var}<%text> effect (average of Low - High \underline{row})} 
            & </%text>${format_helper(df.ix[7,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[7,11])}<%text> \\
    
    
        \multicolumn{13}{|l}{Total effect} & </%text>${format_helper(df.ix[8,11])}<%text>  \\
        \hline
        \multicolumn{4}{|l}{Panel B. t-statistics} & & & & & & & & & & \\
        \hline
        Low            & </%text>${' & '.join(df.ix[9].map(format_helper).values[0:6])}<%text>  & & </%text>${' & '.join(df.ix[9].map(format_helper).values[6:])}<%text> \\
        2              & </%text>${' & '.join(df.ix[10].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[10].map(format_helper).values[6:])}<%text> \\
        3              & </%text>${' & '.join(df.ix[11].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[11].map(format_helper).values[6:])}<%text> \\
        4              & </%text>${' & '.join(df.ix[12].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[12].map(format_helper).values[6:])}<%text> \\
        High           & </%text>${' & '.join(df.ix[13].map(format_helper).values[0:6])}<%text> & & </%text>${' & '.join(df.ix[13].map(format_helper).values[6:])}<%text> \\
        Low - High     & </%text>${' & '.join(df.ix[14].map(format_helper).values[0:5])}<%text> & & & </%text>${' & '.join(df.ix[14].map(format_helper).values[6:11])}<%text> & \\
    
    
        \multicolumn{6}{|l}{</%text>${group_var}<%text> effect (average of Low - High \underline{column})}     
            & </%text>${format_helper(df.ix[15,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[15,11])}<%text> \\
    
    
        \multicolumn{6}{|l}{Within-</%text>${group_var}<%text> effect (average of Low - High \underline{row})} 
            & </%text>${format_helper(df.ix[16,5])}<%text> & & & & & & & </%text>${format_helper(df.ix[16,11])}<%text> \\
        \hline
        \end{tabular}
    \end{center}
    \end{table}
    \end{document}
    </%text>
    

    我的包装器to_tex.py看起来像这样(在if __name__ == "__main__"部分中有示例用法):

    """
    to_tex.py
    
    Class for handling strings of TeX code and producing the
    rendered PDF via PDF LaTeX. Assumes ability to call PDFLaTeX
    via the operating system.
    """
    class to_tex(object):
        """
        Publishes a TeX string to a PDF rendering with pdflatex.
        """
        def __init__(self, tex_string, tex_file, display=False):
            """
            Publish a string to a .tex file, which will be
            rendered into a .pdf file via pdflatex.
            """
            self.tex_string    = tex_string
            self.tex_file      = tex_file
            self.__to_tex_file()
            self.__to_pdf_file(display)
            print "Render status:", self.render_status
    
        def __to_tex_file(self):
            """
            Writes a tex string to a file.
            """
            with open(self.tex_file, 'w') as t_file:
                t_file.write(self.tex_string)
    
        def __to_pdf_file(self, display=False):
            """
            Compile a tex file to a pdf file with the
            same file path and name.
            """
            try:
                import os
                from subprocess import Popen
                proc = Popen(["pdflatex", "-output-directory", os.path.dirname(self.tex_file), self.tex_file])
                proc.communicate()
                self.render_status = "success"
            except Exception as e:
                self.render_status = str(e)
    
            # Launch a display of the pdf if requested.
            if (self.render_status == "success") and display:
                try:
                    proc = Popen(["evince", self.tex_file.replace(".tex", ".pdf")])
                    proc.communicate()
                except:
                    pass
    
    if __name__ == "__main__":
        from mako.template import Template
        template_file = "path/to/template.mako"
        t = Template(filename=template_file)
        tex_str = t.render(arg1="arg1", ...)
        tex_wrapper = to_tex(tex_str, )
    

    我的选择是直接将TeX字符串输入pdflatex并留下作为显示它的选项。

    实际上使用DataFrame的一小段代码在这里:

    # Assume calculation work is done prior to this ...
    all_beta  = pandas.concat([beta_df,  beta_tstat_df], axis=0)
    all_alpha = pandas.concat([alpha_df, alpha_tstat_df], axis=0)
    all_df = pandas.concat([all_beta, all_alpha], axis=1)
    
    # Render result in TeX
    tex_mako  = "/my_project/templates/mako/two_panel_double_sort_table.mako"
    tex_file = "/my_project/some_tex_file_name.tex"
    
    from mako.template import Template
    t = Template(filename=tex_mako)
    tex_str = t.render(all_df, table_title, group_var, tex_risk_name)
    
    import my_project.to_tex as to_tex
    tex_obj = to_tex.to_tex(tex_str, tex_file)
    

答案 2 :(得分:0)

现在最简单的方法是将数据框显示为降价表。为此,您可能需要安装tabulate

在代码单元格中,显示数据框时,请使用以下命令:

from IPython.display import Markdown, display
display(Markdown(df.to_markdown()))

由于它是一个降价表,因此nbconvert可以轻松地将其转换为乳胶。