以编程方式将pandas数据帧转换为markdown表

时间:2015-10-17 01:39:59

标签: python pandas markdown

我有一个从数据库生成的Pandas Dataframe,它包含带有混合编码的数据。例如:

+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| ID | path                    | language | date       | longest_sentence                               | shortest_sentence                                      | number_words | readability_consensus |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| 0  | data/Eng/Sagitarius.txt | Eng      | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...  | 306          | 11th and 12th grade   |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| 31 | data/Nor/Høylandet.txt  | Nor      | 2015-07-22 | Høgskolen i Østfold er et eksempel...          | Som skuespiller har jeg både...                        | 253          | 15th and 16th grade   |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+

正如所见,有英语和挪威语的混合(我认为在数据库中编码为ISO-8859-1)。我需要将此Dataframe输出的内容作为Markdown表获取,但不会遇到编码问题。我跟着this answer(来自问题Generate Markdown tables?)并获得了以下内容:

import sys, sqlite3

db = sqlite3.connect("Applications.db")
df = pd.read_sql_query("SELECT path, language, date, longest_sentence, shortest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db)
db.close()

rows = []
for index, row in df.iterrows():
    items = (row['date'], 
             row['path'], 
             row['language'], 
             row['shortest_sentence'],
             row['longest_sentence'], 
             row['number_words'], 
             row['readability_consensus'])
    rows.append(items)

headings = ['Date', 
            'Path', 
            'Language',
            'Shortest Sentence', 
            'Longest Sentence since', 
            'Words',
            'Grade level']

fields = [0, 1, 2, 3, 4, 5, 6]
align = [('^', '<'), ('^', '^'), ('^', '<'), ('^', '^'), ('^', '>'),
         ('^','^'), ('^','^')]

table(sys.stdout, rows, fields, headings, align)

但是,这会产生UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 72: ordinal not in range(128)错误。如何将Dataframe作为Markdown表输出?也就是说,为了将该代码存储在文件中以用于编写Markdown文档。我需要输出看起来像这样:

| ID | path                    | language | date       | longest_sentence                               | shortest_sentence                                      | number_words | readability_consensus |
|----|-------------------------|----------|------------|------------------------------------------------|--------------------------------------------------------|--------------|-----------------------|
| 0  | data/Eng/Sagitarius.txt | Eng      | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...  | 306          | 11th and 12th grade   |
| 31 | data/Nor/Høylandet.txt  | Nor      | 2015-07-22 | Høgskolen i Østfold er et eksempel...          | Som skuespiller har jeg både...                        | 253          | 15th and 16th grade   |

12 个答案:

答案 0 :(得分:25)

进一步改进答案,以便在IPython Notebook中使用:

def pandas_df_to_markdown_table(df):
    from IPython.display import Markdown, display
    fmt = ['---' for i in range(len(df.columns))]
    df_fmt = pd.DataFrame([fmt], columns=df.columns)
    df_formatted = pd.concat([df_fmt, df])
    display(Markdown(df_formatted.to_csv(sep="|", index=False)))

pandas_df_to_markdown_table(infodf)

或使用tabulate

pip install tabulate

使用示例在文档中。

答案 1 :(得分:23)

Pandas 1.0已于2020年1月29日发布,并支持降价转换,因此您现在可以直接执行此操作!

摘自docs的示例:

const functions = require("firebase-functions");
const admin = require("firebase-admin");
const nodemailer = require("nodemailer")
const cors = require("cors")({
  origin: true
});

admin.initializeApp();

let transporter = nodemailer.createTransport({
  service: 'gmail',
  auth: {
    user: 'xxxxxxxxxg@gmail.com',
    pass: process.env.GMAIL_PASSWORD,
    port: 587,
    secure: false,
  }
});

exports.emailFirma = functions.https.onRequest((req, res) => {

  cors(req, res, () => {
    const { web, email } = req.body;

    const mailOptions = {
      from: 'noreply@gmail.com',
      to: "xxxxxxxxxxxxg@gmail.com",
      subject: `Mail za potvrdu registracije ${email}`,
      html: `Web adresa firme: ${web}, provjeriti ispravnost e-maila: ${email}`
    };
    return transporter.sendMail(mailOptions, (erro) => {
      if (erro) {
        return res.send(erro.toString());
      }
      return res.send('Sent');
    });
  });

});
df = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=['a', 'a', 'b'])
print(df.to_markdown())

或者没有索引:

|    |   A |   B |
|:---|----:|----:|
| a  |   1 |   1 |
| a  |   2 |   2 |
| b  |   3 |   3 |
print(df.to_markdown(index=False)) # use 'showindex' for pandas < 1.1

答案 2 :(得分:15)

我建议使用python-tabulate库来生成ascii-tables。该库也支持pandas.DataFrame

以下是如何使用它:

from pandas import DataFrame
from tabulate import tabulate

df = DataFrame({
    "weekday": ["monday", "thursday", "wednesday"],
    "temperature": [20, 30, 25],
    "precipitation": [100, 200, 150],
}).set_index("weekday")

print(tabulate(df, tablefmt="pipe", headers="keys"))

输出:

| weekday   |   temperature |   precipitation |
|:----------|--------------:|----------------:|
| monday    |            20 |             100 |
| thursday  |            30 |             200 |
| wednesday |            25 |             150 |

答案 3 :(得分:8)

试一试。我得到了它的工作。

请参阅本答案末尾转换为HTML的markdown文件的屏幕截图。

import pandas as pd

# You don't need these two lines
# as you already have your DataFrame in memory
df = pd.read_csv("nor.txt", sep="|")
df.drop(df.columns[-1], axis=1)

# Get column names
cols = df.columns

# Create a new DataFrame with just the markdown
# strings
df2 = pd.DataFrame([['---',]*len(cols)], columns=cols)

#Create a new concatenated DataFrame
df3 = pd.concat([df2, df])

#Save as markdown
df3.to_csv("nor.md", sep="|", index=False)

My output in HTML format by converting HTML to Markdown

答案 4 :(得分:5)

我在这篇文章中尝试了上述几种解决方案,发现这种方法效果最好。

要将pandas数据框转换为降价表,我建议使用pytablewriter。 使用这篇文章中提供的数据:

import pandas as pd
import pytablewriter
from StringIO import StringIO

c = StringIO("""ID, path,language, date,longest_sentence, shortest_sentence, number_words , readability_consensus 
0, data/Eng/Sagitarius.txt , Eng, 2015-09-17 , With administrative experience in the prepa... , I am able to relocate internationally on short not..., 306, 11th and 12th grade
31 , data/Nor/Høylandet.txt  , Nor, 2015-07-22 , Høgskolen i Østfold er et eksempel..., Som skuespiller har jeg både..., 253, 15th and 16th grade
""")
df = pd.read_csv(c,sep=',',index_col=['ID'])

writer = pytablewriter.MarkdownTableWriter()
writer.table_name = "example_table"
writer.header_list = list(df.columns.values)
writer.value_matrix = df.values.tolist()
writer.write_table()

这导致:

# example_table
ID |           path           |language|    date    |                longest_sentence                |                   shortest_sentence                  | number_words | readability_consensus 
--:|--------------------------|--------|------------|------------------------------------------------|------------------------------------------------------|-------------:|-----------------------
  0| data/Eng/Sagitarius.txt  | Eng    | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...|           306| 11th and 12th grade   
 31| data/Nor/Høylandet.txt  | Nor    | 2015-07-22 | Høgskolen i Østfold er et eksempel...        | Som skuespiller har jeg både...                      |           253| 15th and 16th grade   

这是一个降价渲染截图。

enter image description here

答案 5 :(得分:4)

将DataFrame导出为markdown

我创建了以下函数,用于将pandas.DataFrame导出为Python中的markdown:

def df_to_markdown(df, float_format='%.2g'):
    """
    Export a pandas.DataFrame to markdown-formatted text.
    DataFrame should not contain any `|` characters.
    """
    from os import linesep
    return linesep.join([
        '|'.join(df.columns),
        '|'.join(4 * '-' for i in df.columns),
        df.to_csv(sep='|', index=False, header=False, float_format=float_format)
    ]).replace('|', ' | ')

此功能可能无法自动修复OP的编码问题,但这与从pandas转换为markdown不同。

答案 6 :(得分:2)

熊猫已合并PR以支持df。 to_markdown()方法。您可以找到更多详细信息here,它应该很快就可以使用。

答案 7 :(得分:1)

是的,所以我从RohitPython - Encoding string - Swedish Letters)建议的问题,his answer延长了{{3}},并提出了以下内容:

{
    "kadira": {
                  "appId": "XXXXXXXXXXXXX",
                  "appSecret": "XXXXXXXXXXXXXXXXXXXXXXX"
              },
    "reCaptcha": {
        "secretKey": "XXXXXXXXXXXXXXXXXXXXXXX"
    },
    "public": {
        "reCaptcha": {
            "siteKey": "XXXXXXXXXXXXXXXXXXXXN"
        }
    }
}

这是一个重要的前提,# Enforce UTF-8 encoding import sys stdin, stdout = sys.stdin, sys.stdout reload(sys) sys.stdin, sys.stdout = stdin, stdout sys.setdefaultencoding('UTF-8') # SQLite3 database import sqlite3 # Pandas: Data structures and data analysis tools import pandas as pd # Read database, attach as Pandas dataframe db = sqlite3.connect("Applications.db") df = pd.read_sql_query("SELECT path, language, date, shortest_sentence, longest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db) db.close() df.columns = ['Path', 'Language', 'Date', 'Shortest Sentence', 'Longest Sentence', 'Words', 'Readability Consensus'] # Parse Dataframe and apply Markdown, then save as 'table.md' cols = df.columns df2 = pd.DataFrame([['---','---','---','---','---','---','---']], columns=cols) df3 = pd.concat([df2, df]) df3.to_csv("table.md", sep="|", index=False) shortest_sentence列不包含不必要的换行符,因为在提交到SQLite数据库之前将longest_sentence应用于它们。似乎解决方案不是强制执行特定于语言的编码(挪威语为.replace('\n', ' ').replace('\r', '')),而是使用ISO-8859-1代替默认UTF-8

我是通过我的IPython笔记本(Python 2.7.10)运行的,并获得了如下表格(此处的外观固定间距):

ASCII

因此,Markdown表没有编码问题。

答案 8 :(得分:1)

这是一个使用pytablewriter和一些正则表达式的示例函数,使得markdown表更类似于数据帧在Jupyter上的显示方式(行标题为粗体)。

import io
import re
import pandas as pd
import pytablewriter

def df_to_markdown(df):
    """
    Converts Pandas DataFrame to markdown table,
    making the index bold (as in Jupyter) unless it's a
    pd.RangeIndex, in which case the index is completely dropped.
    Returns a string containing markdown table.
    """
    isRangeIndex = isinstance(df.index, pd.RangeIndex)
    if not isRangeIndex:
        df = df.reset_index()
    writer = pytablewriter.MarkdownTableWriter()
    writer.stream = io.StringIO()
    writer.header_list = df.columns
    writer.value_matrix = df.values
    writer.write_table()
    writer.stream.seek(0)
    table = writer.stream.readlines()

    if isRangeIndex:
        return ''.join(table)
    else:
        # Make the indexes bold
        new_table = table[:2]
        for line in table[2:]:
            new_table.append(re.sub('^(.*?)\|', r'**\1**|', line))    

        return ''.join(new_table)

答案 9 :(得分:1)

使用外部工具pandoc和管道:

def to_markdown(df):
    from subprocess import Popen, PIPE
    s = df.to_latex()
    p = Popen('pandoc -f latex -t markdown',
              stdin=PIPE, stdout=PIPE, shell=True)
    stdoutdata, _ = p.communicate(input=s.encode("utf-8"))
    return stdoutdata.decode("utf-8")

答案 10 :(得分:1)

对于那些使用tabulate寻找如何做到这一点的人,我想我会把它放在这里为你节省一些时间:

print(tabulate(df, tablefmt="pipe", headers="keys", showindex=False))

答案 11 :(得分:1)

另一个解决方案。这次通过列表周围的薄包装:https://www.elastic.co/guide/en/elasticsearch/guide/master/_how_match_uses_bool.html

import numpy as np
import pandas as pd
import tabulatehelper as th

df = pd.DataFrame(np.random.random(16).reshape(4, 4), columns=('a', 'b', 'c', 'd'))
print(th.md_table(df, formats={-1: 'c'}))

输出:

|        a |        b |        c |        d |
|---------:|---------:|---------:|:--------:|
| 0.413284 | 0.932373 | 0.277797 | 0.646333 |
| 0.552731 | 0.381826 | 0.141727 | 0.2483   |
| 0.779889 | 0.012458 | 0.308352 | 0.650859 |
| 0.301109 | 0.982111 | 0.994024 | 0.43551  |