我有一年的Google Analytics多属性漏斗API数据。以下示例。源媒体有不同的长度,我正在寻找为每个频道创建一个新列的方法“>”分隔符。
20160101 google / organic
20160101 bing / organic
20160101 google / organic > google / organic
20160101 google / organic > google / organic
20160101 (direct) / (none) > (direct) / (none)
20160101 (direct) / (none) > online.fliphtml5.com / referral
20160101 google / organic > google / organic > (direct) / (none)
20160101 google / organic > (direct) / (none) > google / organic
20160101 google / organic > online.fliphtml5.com / referral > (direct) / (none)
20160101 (direct) / (none) > (direct) / (none) > (direct) / (none)
20160101 pinterest.com / referral > (direct) / (none) > (direct) / (none)
20160101 google / organic > (direct) / (none) > (direct) / (none) > google / organic
20160101 bing / organic > (direct) / (none) > (direct) / (none) > (direct) / (none)
20160101 google / organic > (direct) / (none) > (direct) / (none) > (direct) / (none)
以下是我想要数据格式的一个例子。如何在Python中完成?
Source_Med_Path_1 Source_Med_Path_2....Source_Med_Path_72
google / cpc direct google / organic
答案 0 :(得分:0)
你可以使用Pandas和apply()函数来完成它。
http://pandas.pydata.org/pandas-docs/version/0.18.1/generated/pandas.Series.apply.html
我的代码从csv获取源媒体,但可以轻松地用于API结果。
import pandas as pd
def main():
#read original data from csv
data = pd.read_csv('source.csv')
#split the data on identifier >
splitdata = data['source'].apply(lambda x: pd.Series(x.split('>')))
#join the split data onto transaction data
data = pd.concat([data['transaction'], splitdata], axis=1, join_axes=[data['transaction'].index])
#loop through renaming columns
cols = ['transaction']
for i in range(len(data.columns) -1):
cols.append('Source_Med_Path_' + str(i+1))
data.columns = cols
#output data
print(data)
data.to_csv('output.csv')
if __name__ == '__main__':
main()