我正在尝试使用abind
从大型2D阵列中创建一个三维数组。源数据的结构如下
Firstname Lastname Country City Measure Wk1 Wk2... Wkn
foo bar UK London Height 23 34 34
foo bar UK London Weight 67 67 67
foo bar UK London Fat 6 7 9
John doe US NY Height 546 776 978
John doe US NY Weight 123 656 989
John doe US NY Fat 34 45 67
每个Measure有1912行和25周的数据。我正在尝试创建一个3D阵列,以便我可以测量城市明智的测量趋势 - 高度重量等。
当我使用abind(split(df,df$city), along =3)
时,它会给我错误:
abind error - arg 'XXX' has dims=1912, 35, 1; but need dims=0, 35, X
我已经确认每个度量的行数是1912,并且列数也是同质的。任何帮助将不胜感激。
答案 0 :(得分:1)
您确定要使用数组来衡量城市趋势吗?
通常,分析像您这样的数据的正确方法是将周数转换为long
格式。
我首先将您的数据导入R ...
tc <- textConnection("Firstname Lastname Country City Measure Wk1 Wk2 Wk3
foo bar UK London Height 23 34 34
foo bar UK London Weight 67 67 67
foo bar UK London Fat 6 7 9
John doe US NY Height 546 776 978
John doe US NY Weight 123 656 989
John doe US NY Fat 34 45 67")
df <- read.table(tc, header = TRUE)
然后安装并加载一些有用的软件包。
install.packages("tidyr")
install.packages("dplyr")
library(tidyr)
library(dplyr)
现在使用gather
中的tidyr
命令取消部署数据。
> long_df <- gather(df, Week, Value, -c(1:5))
> long_df
Firstname Lastname Country City Measure Week Value
1 foo bar UK London Height Wk1 23
2 foo bar UK London Weight Wk1 67
3 foo bar UK London Fat Wk1 6
4 John doe US NY Height Wk1 546
5 John doe US NY Weight Wk1 123
6 John doe US NY Fat Wk1 34
7 foo bar UK London Height Wk2 34
8 foo bar UK London Weight Wk2 67
9 foo bar UK London Fat Wk2 7
10 John doe US NY Height Wk2 776
11 John doe US NY Weight Wk2 656
12 John doe US NY Fat Wk2 45
13 foo bar UK London Height Wk3 34
14 foo bar UK London Weight Wk3 67
15 foo bar UK London Fat Wk3 9
16 John doe US NY Height Wk3 978
17 John doe US NY Weight Wk3 989
18 John doe US NY Fat Wk3 67
现在您可以使用dplyr
生成您喜欢的数据摘要......
> long_df %>%
+ group_by(Country, City, Measure) %>%
+ summarise(mean_val = mean(Value))
Source: local data frame [6 x 4]
Groups: Country, City
Country City Measure mean_val
1 UK London Fat 7.333333
2 UK London Height 30.333333
3 UK London Weight 67.000000
4 US NY Fat 48.666667
5 US NY Height 766.666667
6 US NY Weight 589.333333
或国家和措施摘要......
> long_df %>%
+ group_by(Country, Measure) %>%
+ summarise(mean_val = mean(Value), med_val = median(Value), count = n())
Source: local data frame [6 x 5]
Groups: Country
Country Measure mean_val med_val count
1 UK Fat 7.333333 7 3
2 UK Height 30.333333 34 3
3 UK Weight 67.000000 67 3
4 US Fat 48.666667 45 3
5 US Height 766.666667 776 3
6 US Weight 589.333333 656 3