我在给定的块结构中有数千行。在此结构中第一行 - 响应注释,第二行 - 客户名称和最后一行 - 推荐是固定的。其余的字段/行不是必需的。
我正在尝试编写一个代码,我正在读取列名='响应注释'然后键=下一行的列值(客户名称)。 这应该从Row - Response Comments to Recommended, 然后打破循环并获得新的键值。
数据来自Excel文件:
from pandas import DataFrame
import pandas as pd
import os
import numpy as np
xl = pd.ExcelFile('Filepath')
df = xl.parse('Reviews_Structured')
print(type (df))
RowNum Column Name Column Values Key
1 Response Comments they have been unresponsive
2 Customer Name Brian
.
.
.
.
13 Recommended no
有关此循环代码的任何帮助将不胜感激。
答案 0 :(得分:0)
实现逻辑的一种方法是使用collections.defaultdict
和嵌套字典结构。以下是一个例子:
from collections import defaultdict
import pandas as pd
# input data
df = pd.DataFrame([[1, 'Response Comments', 'they have been unresponsive'],
[2, 'Customer Name', 'Brian'],
.....
[9, 'Recommended', 'yes']],
columns=['RowNum', 'Column Name', 'Column Values'])
# fill Key columns
df['Key'] = df['Column Values'].shift(-1)
df.loc[df['Column Name'] != 'Response Comments', 'Key'] = np.nan
df['Key'] = df['Key'].ffill()
# create defaultdict of dict
d = defaultdict(dict)
# iterate dataframe
for row in df.itertuples():
d[row[4]].update({row[2]: row[3]})
# defaultdict(dict,
# {'April': {'Customer Name': 'April',
# 'Recommended': 'yes',
# 'Response Comments': 'they have been responsive'},
# 'Brian': {'Customer Name': 'Brian',
# 'Recommended': 'no',
# 'Response Comments': 'they have been unresponsive'},
# 'John': {'Customer Name': 'John',
# 'Recommended': 'yes',
# 'Response Comments': 'they have been very responsive'}})
答案 1 :(得分:0)
我是否正确理解了这一点,您想要一个带有
的新数据框架columns = ['Response Comments', 'Customer name', ...]
从解析后的excel文件中重塑数据?
从已知的必需列名创建一个空DataFrame,例如
df_new = pd.DataFrame(columns=['Response Comments', 'Customer name', ...])
index = 0
逐行迭代解析的excel文件并分配值
for k, row in df.iterrows():
index += 1
if row['Column Name'] in df_new:
df_new.at[index, row['Column Name']] = row['Column Values']
if row['Column Name'] == 'Recommended':
continue
不是美女,但我不确定你到底想要达到的目的:)