我想加快我需要对数据帧的特定部分进行的计算,这是一个示例数据
days <- c("01.01.2018","01.01.2018","01.01.2018",
"02.01.2018","02.01.2018","02.01.2018",
"03.01.2018","03.01.2018","03.01.2018")
time <- c("00:00:00","01:00:00","02:00:00",
"00:00:00","01:00:00","02:00:00",
"00:00:00","01:00:00","02:00:00")
a <- c(1,2,3,
1,2,3,
1,2,3)
b <- c(1,2,3,
5,6,7,
10,11,12)
results <- NA
df1 <- data.frame(days,time,a,results)
df2 <- data.frame(days,time,b)
我需要将每天00:00:00的df2$b
值添加到df1$a
中的相同整天值,并将其存储在结果中。
现在我这样做:
ndays <- unique(df1$days)
for(i in 1:length(ndays)) {
factor <- df2[(df2$days == ndays[i] & df2$time == "00:00:00"),]$b
df1[df1$days == ndays[i],]$results <- df1[df1$days == ndays[i],]$a + factor
}
问题是,我有很多天的巨大数据帧并且逐个循环它们很慢。有最快的方法吗?
编辑:这是循环后的填充结果列
df1
days time a results
1 01.01.2018 00:00:00 1 2 # results = a + df$b @ 01.01.2018 00:00:00
2 01.01.2018 01:00:00 2 3 # results = a + df$b @ 01.01.2018 00:00:00
3 01.01.2018 02:00:00 3 4 # results = a + df$b @ 01.01.2018 00:00:00
4 02.01.2018 00:00:00 1 6 # results = a + df$b @ 02.01.2018 00:00:00
5 02.01.2018 01:00:00 2 7 # results = a + df$b @ 02.01.2018 00:00:00
6 02.01.2018 02:00:00 3 8 # results = a + df$b @ 02.01.2018 00:00:00
7 03.01.2018 00:00:00 1 11 # results = a + df$b @ 03.01.2018 00:00:00
8 03.01.2018 01:00:00 2 12 # results = a + df$b @ 03.01.2018 00:00:00
9 03.01.2018 02:00:00 3 13 # results = a + df$b @ 03.01.2018 00:00:00
答案 0 :(得分:2)
您可以使用合并而不是for循环来执行此操作,这将更快。在下面的答案中,我还使用了data.table,这是一个快速版本的data.frames,在处理大型表时非常有用。
# install.packages("data.table") # Uncomment if necessary
library(data.table)
df1 <- data.frame(days,time,a) # You don't need to create the result column yet
df2 <- data.frame(days,time,b)
df1 <- data.table(df1)
df2 <- data.table(df2)
# Merge the two tables on the days column
df3 <- merge(df1, df2[time=="00:00:00"], by="days")
# This is your result
answer <- df3[, .(days, time=time.x, a, results=a+b)]
输出:
> answer
days time a results
1: 01.01.2018 00:00:00 1 2
2: 01.01.2018 01:00:00 2 3
3: 01.01.2018 02:00:00 3 4
4: 02.01.2018 00:00:00 1 6
5: 02.01.2018 01:00:00 2 7
6: 02.01.2018 02:00:00 3 8
7: 03.01.2018 00:00:00 1 11
8: 03.01.2018 01:00:00 2 12
9: 03.01.2018 02:00:00 3 13
答案 1 :(得分:2)
public class TerrainGeneration : MonoBehaviour {
[SerializeField]
private Transform groundTile;
private Vector3 row;
private int max = 10;
// Use this for initialization
void Start () {
for ( int i = 0; i <= max; i++)
{
for (int x = 0; x <= max; x++) {
row = new Vector3(i, 0, x);
Instantiate(groundTile, row, Quaternion.identity);
}
}
}
}
public class MouseEvents : MonoBehaviour {
private Color isTargeted;
private Color notTargeted;
private MeshRenderer groundTileMeshRenderer;
private Vector3 mousePosition;
private float mouseX;
private float mouseY;
void Start () {
groundTileMeshRenderer = gameObject.GetComponent<MeshRenderer>();
isTargeted = Color.cyan;
notTargeted = groundTileMeshRenderer.material.color;
}
void Update()
{
mouseX = Mathf.RoundToInt(Input.GetAxis("Mouse X"));
mouseY = Mathf.RoundToInt(Input.GetAxis("Mouse Y"));
mousePosition = new Vector3(mouseX, 0, mouseY);
if (Physics.CheckSphere(mousePosition, 1))
{
**//Get the specific gameObject located at the current mouse position
//Set the gameObject as the target for the color change**
}
}
void OnMouseOver()
{
groundTileMeshRenderer.material.color = isTargeted;
}
void OnMouseExit()
{
groundTileMeshRenderer.material.color = notTargeted;
}
}
有一点需要注意。这假定df2中的时间按时间顺序排列,并且任何给定日期的第一个值是时间 transform(merge(df1,aggregate(b~days,df2,function(x)x[1])),results=a+b)
days time a results b
1 01.01.2018 00:00:00 1 2 1
2 01.01.2018 01:00:00 2 3 1
3 01.01.2018 02:00:00 3 4 1
4 02.01.2018 00:00:00 1 6 5
5 02.01.2018 01:00:00 2 7 5
6 02.01.2018 02:00:00 3 8 5
7 03.01.2018 00:00:00 1 11 10
8 03.01.2018 01:00:00 2 12 10
9 03.01.2018 02:00:00 3 13 10
。
答案 2 :(得分:2)
使用dplyr
的一个解决方案如下所示。解决方案的方法是:
1)filter
以外00:00:00
以外的所有时间df2
2)然后在inner_join
上df1
df2
和days
。这样就可以从b
中选择df2
到合并数据框中每个匹配day
的值。最后添加a
和b
以查找result
。
df1 <- data.frame(days,time,a,results, stringsAsFactors = FALSE)
df2 <- data.frame(days,time,b, stringsAsFactors = FALSE)
library(dplyr)
df2 %>%
filter(time == "00:00:00") %>%
inner_join(df1, by="days") %>%
mutate(time = time.y, results = a+b) %>%
select( days, time, a, b, results)
#Result:
days time a b results
1 01.01.2018 00:00:00 1 1 2
2 01.01.2018 01:00:00 2 1 3
3 01.01.2018 02:00:00 3 1 4
4 02.01.2018 00:00:00 1 5 6
5 02.01.2018 01:00:00 2 5 7
6 02.01.2018 02:00:00 3 5 8
7 03.01.2018 00:00:00 1 10 11
8 03.01.2018 01:00:00 2 10 12
9 03.01.2018 02:00:00 3 10 13