我有一个包含调查评论的数据框。每一位受访者的组号都有一栏。然后有几列包含标题行中的问题文本和后续行中的响应。并非每个人都回答了每个问题,所以有空白单元格。
我想使用docx包将注释输出到Word文件。我想将问题文本显示为标题,下面将组号显示为标题(按组号对响应进行分组),下面将其显示在项目符号列表中,然后移至下一个问题并重复。另外,我也不想输出空白单元格。
下面的代码给出了我要做什么的想法。
import docx
import pandas as pd
from docx import Document
import numpy as np
from docx.shared import Inches
from docx.enum.section import WD_SECTION
from docx.enum.section import WD_ORIENT
# initialize list of lists
data = [['Group 1', 'Comment A', 'Comment B', 'Comment C'], ['Group 2', 'Comment D', '', ''], ['Group 2', 'Comment E', '', 'Comment F'], ['Group 1', '', 'Comment G', 'Comment H'], ]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Group', 'Question 1', 'Question 2', 'Question 3'])
print(df)
# create file
doc = Document()
sections = doc.sections
section = sections[0]
# Convert to landscape orientation
new_width, new_height = section.page_height, section.page_width
section.orientation = WD_ORIENT.LANDSCAPE
section.page_width = new_width
section.page_height = new_height
# Document Title
doc.add_heading('Document Title', level=0)
# Opening text
doc.add_paragraph('Some text...')
# Do I need to sort by 'Group' before doing the loops?
# loop through the questions - this isn't working
for column in df[2:]:
# create a heading for each question
doc.add_heading(column, level=1)
for g in df.Group:
# create a heading for each question
doc.add_heading(g, level=3)
for c in df[g]:
doc.add_paragraph(c, style='List Bullet')
# save the doc
doc.save('./test.docx')
输出为:
Document Title
Some text...
Question 1
Group 1
- Comment A
Group 2
- Comment D
- Comment E
Question 2
Group 1
- Comment B
- Comment G
Question 3
Group 1
- Comment C
- Comment H
Group 2
- Comment F
答案 0 :(得分:0)
这适用于循环:
# loop through the questions
for column in df.columns[1:]:
# create a heading for each question
doc.add_heading(column, level=3)
###Make a new dataframe with only Group and column of interest
new_df = df[['Group', column]]
###Make list of all units
unit_list = list(new_df['Group'].unique())
###Make list of comments in each unit for this column
for unit in unit_list:
comments = [row[2] for row in new_df.itertuples() if row[1] == unit]
comments = [i for i in comments if len(i) > 0]
###If there were any comments in this unit, add the unit as a subheader
if len(comments) > 0:
doc.add_heading(unit, level=4)
# Bullet list of comments
for c in comments:
doc.add_paragraph(c, style='List Bullet')