Question

我有一个包含功能Date，Age和Customer_ID的数据集。 Age中的某些行中包含缺失值（NAs），我想将它们归咎于它们。

以下是一些示例数据：

Date <- c("201101", "201102", "201101", "201102", "201103")
Age <- c("12-17", "12-17", "30-35", NA, NA)
Customer_ID <- c("1234", "1234", "5678", "5678", "5678")
df <- data.frame(Date, Age, Customer_ID)

Date      Age      Customer_ID
201101    12-17    1234
201102    12-17    1234
201101    30-35    5678
201102    NA       5678
201103    NA       5678

我想用30-35替换Age中的NA。

因此，对于所有NAs，必须检查是否存在具有相同Customer_ID的另一行，并将NA替换为另一行中所述的Age值。

关于如何做到这一点的任何想法？感谢。

Answer 1

您可以使用fill中的tidyr功能。对于最后一次观察结转，它是tidyr函数，即用先前的非NA值填充NA值。为了实现此目的，您可以使用arrange对第2列进行排序，该列对非NA后面的NA值进行排序，然后您可以按客户ID进行分组并填写Age列：

library(dplyr)
library(tidyr)
df %>% arrange(Age) %>% group_by(Customer_ID) %>% fill(Age)

# Source: local data frame [5 x 3]
# Groups: Customer_ID [2]

#      Date    Age Customer_ID
#    <fctr>  <fctr>  <fctr>
# 1  201101   12-17    1234
# 2  201102   12-17    1234
# 3  201101   30-35    5678
# 4  201102   30-35    5678
# 5  201103   30-35    5678

Answer 2

以基地R：

def calc_p(group):
    global df_old_len, df_new_len, clicks_old, clicks_new
    clicks_old += len(group[(group.landing_page == 'old_page') & (group.converted == 1)])
    clicks_new += len(group[(group.landing_page == 'new_page') & (group.converted == 1)])
    df_old_len += len(group[group.landing_page == 'old_page'])
    df_new_len += len(group[group.landing_page == 'new_page'])
    ctr_old = float(clicks_old)/df_old_len
    ctr_new = float(clicks_new)/df_new_len
    z_score, p_val, null = z_test.z_test(ctr_old, ctr_new, df_old_len, df_new_len, effect_size=0.001)
    return p_val

# Initialize global values to 0 for cumulative calc_p
df_old_len = 0
df_new_len = 0
clicks_old = 0
clicks_new = 0

grouped = df.groupby(by='time').agg(calc_p)

用相邻行R的副本替换NA

2 个答案: