我有一个数据框,我想按日期框的第一行计算x的所有平均值和y组的所有和。
以下链接是我想要的结果。 The result expected
这里是数据。
dt=structure(list(year = structure(c(5L, 1L, 2L, 3L, 4L), .Label = c("1980",
"1981", "1982", "1985", "group"), class = "factor"), x1 = structure(c(4L,
1L, 3L, 2L, 1L), .Label = c("1", "2", "4", "A"), class = "factor"),
y1 = structure(c(4L, 1L, 3L, 2L, 2L), .Label = c("1", "3",
"5", "A"), class = "factor"), x2 = structure(c(5L, 1L, 4L,
3L, 2L), .Label = c("2", "4", "5", "6", "A"), class = "factor"),
y2 = structure(c(4L, 1L, 3L, 3L, 2L), .Label = c("3", "5",
"7", "A"), class = "factor"), x3 = structure(c(4L, 1L, 3L,
2L, 1L), .Label = c("4", "6", "8", "B"), class = "factor"),
y3 = structure(c(4L, 1L, 3L, 2L, 1L), .Label = c("3", "5",
"6", "B"), class = "factor"), x4 = structure(c(4L, 1L, 3L,
2L, 3L), .Label = c("2", "4", "5", "C"), class = "factor"),
y4 = structure(c(5L, 1L, 2L, 3L, 4L), .Label = c("3", "4",
"5", "6", "C"), class = "factor"), x5 = structure(c(5L, 2L,
1L, 3L, 4L), .Label = c("3", "4", "6", "7", "C"), class = "factor"),
y5 = structure(c(4L, 2L, 1L, 3L, 2L), .Label = c("2", "5",
"8", "C"), class = "factor")), class = "data.frame", row.names = c(NA,
-5L))
预期结果
result_expected <- structure(list(year = c(1980L, 1981L, 1982L, 1985L), A_x_mean = c(1.5,
5, 3.5, 2.5), A_y_sum = c(4L, 12L, 10L, 8L), B_x_mean = c(4L,
8L, 6L, 4L), B_y_sum = c(3L, 6L, 5L, 3L), C_x_mean = 3:6, C_y_sum = c(8L,
6L, 13L, 11L)), class = "data.frame", row.names = c(NA, -4L))
我在goole和stackoverflow中有搜索关键字,但是没有适当的答案。我目前的想法是在第一行中计算唯一的A,B,C组。
require(tidyverse)
group_variables <- dt%>%gather(key,value)%>%distinct(value)%>%arrange(value)
然后通过group_variables
在for
中获得该行
for i in group_variables{......}
或者我可以通过gathe
中的spread
和tidyr
以及dplyr
方法来更改数据帧的结构,就像下面的代码一样,
dt_new%>% group_by (group)%>%
summarise(mean=mean(x,na.rm=TRUE),
sum=sum(x,na.rm=TURE))
答案 0 :(得分:1)
首先,我们需要取出具有组的第一行,使数据帧变长,将x1,x2,x3简化为x等,然后将组放回原位:
group_var = sapply(dt[1,-1],as.character)
mat <-
dt[-1,] %>% pivot_longer(-year) %>%
mutate(value=as.numeric(as.character(value))) %>%
mutate(group=as.character(group_var[as.character(name)])) %>%
mutate(name=substr(name,1,1))
mat
# A tibble: 40 x 4
year name value group
<fct> <chr> <dbl> <chr>
1 1980 x 1 A
2 1980 y 1 A
3 1980 x 2 A
4 1980 y 3 A
5 1980 x 4 B
6 1980 y 3 B
7 1980 x 2 C
8 1980 y 3 C
9 1980 x 4 C
10 1980 y 5 C
现在剩下的就是根据年份,名称和分组对它们进行分组并执行各自的功能,因此我们定义了一个功能:
func = function(DF,func){
DF %>%
group_by(group,name,year) %>%
summarise_all(func) %>%
mutate(label=paste(group,name,func,sep="_")) %>%
ungroup %>%
select(year,value,label) %>%
pivot_wider(values_from=value,names_from=label)
}
然后将其应用于数据的两个部分:
cbind(func(mat %>% filter(name=="x"),"mean"),func(mat %>% filter(name=="y"),"sum"))
year A_x_mean B_x_mean C_x_mean year A_y_sum B_y_sum C_y_sum
1 1980 1.5 4 3 1980 4 3 8
2 1981 5.0 8 4 1981 12 6 6
3 1982 3.5 6 5 1982 10 5 13
4 1985 2.5 4 6 1985 8 3 11
答案 1 :(得分:0)
一种方法是将因素变成字符,然后将第一行作为列名(并删除第一行)。然后,我使用public VM
{
/// <summary>
/// A UI consumable list of display objects that hold tube schedule information
/// </summary>
public ObservableCollection<ScheduleDisplayObject> DisplayObjects
{
get
{
var list = new ObservableCollection<ScheduleDisplayObject>();
foreach (ScheduleObject item in ScheduleManager.getList())
{
list.Add(new ScheduleDisplayObject(item));
}
return list;
}
}
public VM()
{
DisplayObjects[0].BackgroundColor = new SolidColorBrush(Colors.Red); //This doesn't do anything
}
}
public static class ScheduleManager
{
static ObservableCollection<ScheduleObject> ScheduleObjects = new ObservableCollection<ScheduleObject>();
public static ObservableCollection<ScheduleObject> getList()
{
return ScheduleObjects;
}
}
public class ScheduleDisplayObject : ObservableClass
{
#region Declarations
private SolidColorBrush _backgroundColor;
/// <summary>
/// Background color of the UI item
/// </summary>
public SolidColorBrush BackgroundColor
{
get { return _backgroundColor; }
set
{
_backgroundColor = value;
RaisePropertyChanged();
}
}
private ScheduleObject _scheduleObject;
/// <summary>
/// Object containing schedule data for each tube
/// </summary>
public ScheduleObject ScheduleObject
{
get { return _scheduleObject; }
set
{
_scheduleObject = value;
RaisePropertyChanged();
}
}
#endregion
#region Constructor
public ScheduleDisplayObject(int r, int c)
{
ScheduleObject = new ScheduleObject(r, c);
}
public ScheduleDisplayObject(ScheduleObject s)
{
ScheduleObject = s;
}
#endregion
}
public class ScheduleObject : ObservableClass
{
//Data (strings, doubles, ints, etc)
}
和dplyr
进行了一些数据处理,以使数据按年和字母长整,然后在取和和均值后将数据转换为宽格式。
tidyr