我有一个数据框,其中包含行中的客户信息和列中的句点(月)。我使用这种格式进行聚类。我想缩放行中的值。我可以使用以下代码执行此操作,但存在一些问题:
以下是我的示例数据和代码:
mydata
cust P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20
1 A 1 1.0 1 1.0 1 1.0 1 1.0 1 1.0 1 1.0 1 1.0 1 1.0 1 1.0 1 1.0
2 B 5 5.0 5 5.0 5 5.0 5 5.0 5 5.0 5 5.0 5 5.0 5 5.0 5 5.0 5 5.0
3 C 9 9.0 9 9.0 9 9.0 9 9.0 9 9.0 9 9.0 9 9.0 9 9.0 9 9.0 9 9.0
4 D 0 1.0 2 1.0 0 1.0 2 1.0 0 1.0 2 1.0 0 1.0 2 1.0 0 1.0 2 1.0
5 E 4 5.0 6 5.0 4 5.0 6 5.0 4 5.0 6 5.0 4 5.0 6 5.0 4 5.0 6 5.0
6 F 8 9.0 10 9.0 8 9.0 10 9.0 8 9.0 10 9.0 8 9.0 10 9.0 8 9.0 10 9.0
7 G 2 1.5 1 0.5 0 0.5 1 1.5 2 1.5 1 0.5 0 0.5 1 1.5 2 1.5 1 0.5
8 H 6 5.5 5 4.5 4 4.5 5 5.5 6 5.5 5 4.5 4 4.5 5 5.5 6 5.5 5 4.5
9 I 10 9.5 9 8.5 8 8.5 9 9.5 10 9.5 9 8.5 8 8.5 9 9.5 10 9.5 9 8.5
我正在使用的代码:
library(dplyr)
library(tidyr)
# first transpose the data
g_mydata = mydata %>% gather(period,value,-cust)
spr_mydata = g_mydata %>% spread(cust,value)
# then scale the values for each period
sc_mydata = spr_mydata %>%
mutate_each_(funs(scale),vars = c("A","B","C","D","E","F","G","H","I") )
# then transpose again back to original format
g_scdata = sc_mydata %>% gather(cust,value,-period)
scaled_data = g_scdata %>% spread(period,value)
感谢您提供任何帮助或建议。
答案 0 :(得分:3)
您可以随时尝试apply()
:
sc_mydata = apply(spr_mydata[, -1], 1, scale)
如果NaN
混乱,您可以转置spr_mydata
并尝试直接运行scale()
:
scale(spr_mydata[-1, ])
答案 1 :(得分:2)
这是一种dplyr方式。
long_data =
mydata %>%
gather(period, value,-cust)
to_scale =
long_data %>%
group_by(cust) %>%
summarize(sd = sd(value)) %>%
filter(sd != 0) %>%
select(-sd)
flat =
long_data %>%
anti_join(to_scale) %>%
mutate(value = 0)
wide_scale =
long_data %>%
right_join(to_scale) %>%
group_by(cust) %>%
mutate(value =
value %>%
scale %>%
signif(7)) %>%
bind_rows(flat) %>%
spread(period, value)
type =
wide_scale %>%
select(-cust) %>%
distinct %>%
mutate(type_ID = 1:n())
customer__type =
type %>%
left_join(wide_scale) %>%
select(type_ID, cust)