Question

我正在尝试使用abind从大型2D阵列中创建一个三维数组。源数据的结构如下

Firstname   Lastname    Country City    Measure Wk1 Wk2... Wkn
foo            bar        UK    London  Height  23   34    34
foo            bar        UK    London  Weight  67  67     67
foo           bar         UK    London  Fat     6   7      9
John          doe         US    NY      Height  546 776   978
John          doe         US    NY      Weight  123 656   989
John          doe         US    NY      Fat     34  45    67

每个Measure有1912行和25周的数据。我正在尝试创建一个3D阵列，以便我可以测量城市明智的测量趋势 - 高度重量等。

当我使用abind(split(df,df$city), along =3)时，它会给我错误：

abind error - arg 'XXX' has dims=1912, 35, 1; but need dims=0, 35, X

我已经确认每个度量的行数是1912，并且列数也是同质的。任何帮助将不胜感激。

Answer 1

您确定要使用数组来衡量城市趋势吗？

通常，分析像您这样的数据的正确方法是将周数转换为long格式。

我首先将您的数据导入R ...

tc <- textConnection("Firstname   Lastname    Country City    Measure Wk1 Wk2 Wk3
foo            bar        UK    London  Height  23   34    34
foo            bar        UK    London  Weight  67  67     67
foo           bar         UK    London  Fat     6   7      9
John          doe         US    NY      Height  546 776   978
John          doe         US    NY      Weight  123 656   989
John          doe         US    NY      Fat     34  45    67")


df <- read.table(tc, header = TRUE)

然后安装并加载一些有用的软件包。

install.packages("tidyr")
install.packages("dplyr")
library(tidyr)
library(dplyr)

现在使用gather中的tidyr命令取消部署数据。

> long_df <- gather(df, Week, Value, -c(1:5)) 
> long_df
   Firstname Lastname Country   City Measure Week Value
1        foo      bar      UK London  Height  Wk1    23
2        foo      bar      UK London  Weight  Wk1    67
3        foo      bar      UK London     Fat  Wk1     6
4       John      doe      US     NY  Height  Wk1   546
5       John      doe      US     NY  Weight  Wk1   123
6       John      doe      US     NY     Fat  Wk1    34
7        foo      bar      UK London  Height  Wk2    34
8        foo      bar      UK London  Weight  Wk2    67
9        foo      bar      UK London     Fat  Wk2     7
10      John      doe      US     NY  Height  Wk2   776
11      John      doe      US     NY  Weight  Wk2   656
12      John      doe      US     NY     Fat  Wk2    45
13       foo      bar      UK London  Height  Wk3    34
14       foo      bar      UK London  Weight  Wk3    67
15       foo      bar      UK London     Fat  Wk3     9
16      John      doe      US     NY  Height  Wk3   978
17      John      doe      US     NY  Weight  Wk3   989
18      John      doe      US     NY     Fat  Wk3    67

现在您可以使用dplyr生成您喜欢的数据摘要......

> long_df %>% 
+   group_by(Country, City, Measure) %>% 
+   summarise(mean_val = mean(Value))
Source: local data frame [6 x 4]
Groups: Country, City

  Country   City Measure   mean_val
1      UK London     Fat   7.333333
2      UK London  Height  30.333333
3      UK London  Weight  67.000000
4      US     NY     Fat  48.666667
5      US     NY  Height 766.666667
6      US     NY  Weight 589.333333

或国家和措施摘要......

> long_df %>% 
+   group_by(Country,  Measure) %>% 
+   summarise(mean_val = mean(Value), med_val = median(Value), count = n())
Source: local data frame [6 x 5]
Groups: Country

  Country Measure   mean_val med_val count
1      UK     Fat   7.333333       7     3
2      UK  Height  30.333333      34     3
3      UK  Weight  67.000000      67     3
4      US     Fat  48.666667      45     3
5      US  Height 766.666667     776     3
6      US  Weight 589.333333     656     3

abind error - arg＆＃39; XXX＆＃39; dims = 1912,35,1;但需要dims = 0,35，X

1 个答案: