我有一个看起来像这样的xlsx文件;
Name 01/09/16 02/09/16 03/09/16
Jack In Out In
Lisa Out In Out
Tom Out In In
我尝试使用pandas在以下表格中打印出这些数据;
+----------------------------------+-------------+-------------+-------------+
| Status | 01/09/16 | 02/09/16 | 03/09/16 |
+----------------------------------+-------------+-------------+-------------+
| In | Jack Tom Tom
| Lisa | Jack |
+----------------------------------+-------------+-------------+-------------+
| Out | Lisa
Tom | Jack | Lisa |
+----------------------------------+-------------+-------------+-------------+
我正努力想办法与熊猫一起做这件事。我想询问是否有任何简单的方法来迭代日期列,将其与行匹配并获取单元格值?
例如,让我们从第一列01/09/16开始,如何使用pandas向下移动该列并找到单元格值' In',将其与行名称匹配'杰克'然后将其添加到这样的嵌套字典中;
dictionary = {'01/09/16': {In: [Jack], Out: [Lisa, Tom] } }
如果我可以这样做,我可以使用PrettyTable之类的东西在表格中组织它,就像它在上面的第二个表格中所示。
答案 0 :(得分:3)
考虑在数据框的所有系列列中运行的字典理解。但首先,请确保将 Name 设为dataframe的索引:
from io import StringIO
import pandas as pd
data = '''
Name 01/09/16 02/09/16 03/09/16
Jack In Out In
Lisa Out In Out
Tom Out In In
'''
df = pd.read_table(StringIO(data), sep="\s+", index_col=0)
print(df)
# 01/09/16 02/09/16 03/09/16
# Name
# Jack In Out In
# Lisa Out In Out
# Tom Out In In
# BUILD DICTIONARY
dfdict = {col: (df[col][df[col] == 'In'].index.values,
df[col][df[col] == 'Out'].index.values) for col in df.columns}
dfdict['Status'] = ['In', 'Out']
# CAST TO DATAFRAME
finaldf = pd.DataFrame(dfdict)
finaldf = finaldf[['Status'] + [col for col in df.columns]] # RE-ORDER COLS
print(finaldf)
# Status 01/09/16 02/09/16 03/09/16
# 0 In [Jack] [Lisa, Tom] [Jack, Tom]
# 1 Out [Lisa, Tom] [Jack] [Lisa]
答案 1 :(得分:2)
IIUC
pd.melt(
df, id_vars=['Name'], value_vars=df.columns[1:].tolist(),
value_name='Status', var_name='Date'
).set_index(['Status', 'Date']).groupby(level=[0, 1]).Name.apply(list).unstack()
或使用更少的代码
df.set_index('Name').unstack().reset_index().groupby(['level_0', 0]) \
.Name.apply(list).rename_axis([None, None]).unstack(0)