在从现有docx读取和写入新docx时保留表格式
but I am not getting the output in same format Need help to fix this so that I can copy this table in the same format to my new docx
ITEM
NEEDED
Books
1
Pens
3
Pencils
2
Highlighter
2 colors
Scissors
1 pair
我正在使用的代码如下。
import docx
doc = docx.Document('demo.docx')
doc = docx.Document('demo.docx')
for table in doc.tables:
for row in table.rows:
for cell in row.cells:
for para in cell.paragraphs:
print para.text
我正在经历Parsing of table from .docx file,但同样,我需要在新docx中创建表,不确定如何执行此操作。
答案 0 :(得分:0)
我认为我有一个笨拙的方法,即先将原始docx表转换为pandas DataFrame,然后再将数据帧添加回新文档中。
从我收集到的信息来看,文档文件(* .docx,*。doc,*。txt)以字符串形式读取,因此我们必须将数据视为字符串。这意味着您将需要知道表的列数和行数。
假设原始文档文件名为“ Stationery.docx”,则可以解决问题。
import docx
import pandas as pd
import numpy as np
doc = docx.Document("Stationery.docx")
df = pd.DataFrame()
tables = doc.tables[0]
##Getting the original data from the document to a list
ls =[]
for row in tables.rows:
for cell in row.cells:
for paragraph in cell.paragraphs:
ls.append(paragraph.text)
def Doctable(ls, row, column):
df = pd.DataFrame(np.array(ls).reshape(row,column)) #reshape to the table shape
new = docx.Document()
word_table =new.add_table(rows = row, cols = column)
for x in range(0,row,1):
for y in range(0,column,1):
cell = word_table.cell(x,y)
cell.text = df.iloc[x,y]
return new, df