我正在尝试对行和列进行转置,但不知道该如何完成。
以下是具有原始数据和所需输出的数据表: https://drive.google.com/file/d/1HLBSCdziga3gJtCkNEpx-paFO9eHHYJy/view?usp=sharing
这是我可以使用Python(熊猫)做的事情吗?如果您能在此方面帮助我,我将不胜感激。
非常感谢您! 泰
答案 0 :(得分:0)
您可以这样做:
df = pd.DataFrame({ "Date": ["10/1/2018","10/2/2018","10/2/2018","10/3/2018"],
"Website Action" : ["Scroll","Scroll","Click","Swipe"],
"Source:Email" : [1,1,3,2],
"Source:Social" : [4,2,10,6],
"Source:Display" : [5,3,3,9]})
仅设置示例框架(列顺序略有偏离):
Date Source:Display Source:Email Source:Social Website Action
0 10/1/2018 5 1 4 Scroll
1 10/2/2018 3 1 2 Scroll
2 10/2/2018 3 3 10 Click
3 10/3/2018 9 2 6 Swipe
您现在可以结合使用“ melt”和“ pivot_table”来获得所需的内容:
df.melt(id_vars=["Date","Website Action"]).pivot_table(index = ["Date","variable"], columns = "Website Action").fillna(0)
这将产生:
value
Website Action Click Scroll Swipe
Date variable
10/1/2018 Source:Display 0.0 5.0 0.0
Source:Email 0.0 1.0 0.0
Source:Social 0.0 4.0 0.0
10/2/2018 Source:Display 3.0 3.0 0.0
Source:Email 3.0 1.0 0.0
Source:Social 10 2.0 0.0
10/3/2018 Source:Display 0.0 0.0 9.0
Source:Email 0.0 0.0 2.0
Source:Social 0.0 0.0 6.0
重新排序和重命名,由您自己决定:-)
答案 1 :(得分:0)
我的解决方案可能有点麻烦:
import pandas as pd
df = pd.read_table('A/tab/separated/file/with/your/data.tsv')
#Stack your columns as an extra index, then unstack one of your indexes into columns
reshaped_df = df.set_index(['Date', 'Website Action']).stack().unstack(level=1).fillna(0).reset_index()
# Rename the columns, calculate total engagements
reshaped_df.columns = ['Date','Source','Click','Scroll','Swipe']
reshaped_df['Total Engagements'] = reshaped_df[['Click','Scroll','Swipe']].sum(axis=1)
长话短说,是的,大熊猫可以做到,上面是一个例子。我建议在交互式外壳中运行所有内容(检查应用set_index时发生的情况,堆栈或取消堆栈时发生的情况)。