Question

我有以下python pandas dataframe：

          |   Number of visits per year  |
user id   |  2013  | 2014 | 2015 | 2016  |
   A           4       3     6      0     
   B           3       0     7      3
   C          10       6     3      0

我想根据访问次数计算返回的用户百分比。对不起，我还没有任何代码，我不知道如何开始这个。

这是我要找的最终结果：

         |       Number of visits in the year     |
 Year    | 1  | 2 | 3  | 4  | 5 | 6 | 7  | 8  | 9 | 10 |  
 2014      7%   3%  4%   15%  6%  7%  18%  17% 3%   2%   
 2015      3% ....
 2016

基于上述情况，我可以说，2013年有4次访问该商店的客户中有15％于2014年回到商店。

非常感谢你。

更新：这就是我所做的，也许通过循环有更好的方法？

每年，我都有这样的csv：

user_id |    NR_V
   A           4      
   B           3       
   C          10

NR_V代表访问次数。

所以我上传了每个csv，因为它是自己的df，我有df_2009，df_2010，...直到df_2016。

对于每个文件，如果他们在下一年购物，我添加了一个0/1的列列。

 df_2009['shopped2010'] = np.where(df_2009['user_ID'].isin(df_2010['user_ID']), 1, 0)

然后我转动了每个数据帧。

 pivot_2009 = pd.pivot_table(df_2009,index=["NR_V"],aggfunc={"NR_V":len, "shopped2010":np.sum})

接下来，对于每个数据框，我创建了一个新的数据框，其中一列计算了访问次数的百分比。

p_2009 = pd.DataFrame()
p_2009['%returned2010'] = (pivot_2009['shopped2010']/pivot_2009['NR_V'])*100

最后，我将所有这些数据帧合并为一个。

dfs = [p_2009, p_2010, p_2011, p_2012, p_2013, p_2014, p_2015 ]
final = pd.concat(dfs, axis=1)

Answer 1

考虑样本访问数据框df

df = pd.DataFrame(
    np.random.randint(1, 10, (100, 5)),
    pd.Index(['user_{}'.format(i) for i in range(1, 101)], name='user id'),
    [
        ['Number of visits per year'] * 5,
        [2012, 2013, 2014, 2015, 2016]
    ]
)

df.head()

您可以使用参数pd.value_counts来应用normalize=True 此外，由于8的条目代表8次单独访问，因此应该计数8次。我会在repeat

之前使用value_counts来完成此操作

def count_visits(col):
    v = col.values
    return pd.value_counts(v.repeat(v), normalize=True)

df.apply(count_visits).stack().unstack(0)

Answer 2

我使用了每个访问者的索引值，并检查了下一年相同的索引值（也就是相同的vistor_ID）是否大于0。然后将其添加到字典中，形式为True或False，您可以将其用于条形图。我还制作了两个列表（times_returned和returned_at_all）以进行其他数据操作。

char *test = "hello ";
strcat_s(test, strlen(test), "guys");
// like output "hello guys" but application crashing...

Answer 3

请在下面找到我的解决方案。作为一个说明，我非常肯定这可以改善。


# step 0: create data frame
df = pd.DataFrame({'2013':[4, 3, 10], '2014':[3, 0, 6], '2015':[6, 7, 3], '2016':[0, 3, 0]}, index=['A', 'B', 'C'])

# container list of dataframes to be concatenated
frames = []

# iterate through the dataframe one column at a time and determine its value_counts(freq table)
for name, series in df.iteritems():
  frames.append(series.value_counts())

# Merge frequency table for all columns into a dataframe
temp_df = pd.concat(frames, axis=1).transpose().fillna(0)

# Find the key for the new dataframe (i.e. range for number of columns), and append missing ones
cols = temp_df.columns
min = cols.min()
max = cols.max()
for i in range(min, max):
    if (not i in a):
        temp_df[str(i)] = 0

# Calculate percentage
final_df = temp_df.div(temp_df.sum(axis=1), axis=0)

Python Pandas计算每个类别的回报百分比

3 个答案: