我有一系列活动 popular_activities ,如下所示:
['google.co.uk', 'whatsapp', 'sharelatex.com', 'Financial Times', 'other',
'en.wikipedia.org', 'Instagram for Android', 'YouTube for Android',
'arxiv-sanity.com', 'quora.com', 'microsoft word', 'Inbox by Gmail',
'Google Chrome for Android', 'youtube.com', 'mendeley desktop',
'web.whatsapp.com', 'Preview', 'texshop', 'Google Now', 'mobile -
com.compassnews.app', 'netflix.com', 'WhatsApp Messenger Android', 'Facebook
for Android', 'arxiv.org']
我还有一个DataFrame如下:
Activity Time Spent (seconds)
Date
2017-03-25T00:05:00 [netflix.com, other, Google Now] [30, 6, 2]
2017-03-25T00:10:00 [netflix.com] [300]
2017-03-25T00:15:00 [netflix.com] [102]
2017-03-25T00:30:00 [netflix.com] [232]
2017-03-25T00:35:00 [netflix.com] [279]
我想在此DataFrame' Activity_vector'中创建一个新列。这样,该列中的每个元素都是一个长度等于 popular_activities 的向量,其中相应的活动索引(如 popular_activities 数组中所示)包含所花费的时间关于那项活动。
因此,例如,对于Date:2017-03-25T00:05:00的第二个元素,新列中的相应元素' Activity_vector'会是
[0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 30, 0, 0, 0]
这样'活动' netflix.com,其中包含相应的“Time Spent(秒)' popular_activities 中的值30和索引21将具有索引21填充值30的数组,'活动'其他人会将其相应的索引(4)填充为值6,对Google Now也是如此。
我所拥有的片段如下,其中self.clean_df是所讨论的DataFrame:
class Clean_DF(object):
....
def clean_data(self, time_percentage):
....
self.clean_df['Activity_vector'] = self.clean_df.apply(lambda x: self.activity_to_vector(x),axis=1)
def activity_to_vector(self, row):
vect = np.zeros(len(self.popular_apps))
for x,y in zip(row['Activity'], row['Time Spent (seconds)']):
vect[self.popular_apps_dict[x]] += vect[self.popular_apps_dict[x]] + y
return vect
然而,当我运行此操作时,我收到以下错误
ValueError: Shape of passed values is (3862, 24), indices imply (3862, 2)
如何解决此错误/编写解决我问题的函数?
答案 0 :(得分:0)
num = len(popular_activities)
def make_array(s):
z = [0] * num
time = s['Time Spent (seconds)']
for i, val in enumerate(s['Activity']):
idx = popular_activity.index(val)
z[idx] = time[i]
return z
df.apply(make_array, axis=1)