在没有循环的情况下为数据帧的不同部分添加不同的值

时间:2018-02-04 16:45:09

标签: r

我想加快我需要对数据帧的特定部分进行的计算,这是一个示例数据

days <- c("01.01.2018","01.01.2018","01.01.2018",
          "02.01.2018","02.01.2018","02.01.2018",
          "03.01.2018","03.01.2018","03.01.2018")
time <- c("00:00:00","01:00:00","02:00:00",
          "00:00:00","01:00:00","02:00:00",
          "00:00:00","01:00:00","02:00:00")
a <- c(1,2,3,
       1,2,3,
       1,2,3)
b <- c(1,2,3,
       5,6,7,
       10,11,12)

results <- NA

df1 <- data.frame(days,time,a,results)
df2 <- data.frame(days,time,b)

我需要将每天00:00:00的df2$b值添加到df1$a中的相同整天值,并将其存储在结果中。 现在我这样做:

ndays <- unique(df1$days)

for(i in 1:length(ndays)) {
  factor <-  df2[(df2$days == ndays[i] & df2$time == "00:00:00"),]$b
  df1[df1$days == ndays[i],]$results <- df1[df1$days == ndays[i],]$a + factor

}

问题是,我有很多天的巨大数据帧并且逐个循环它们很慢。有最快的方法吗?

编辑:这是循环后的填充结果列

df1  
        days     time a results
1 01.01.2018 00:00:00 1       2  # results = a + df$b @ 01.01.2018 00:00:00
2 01.01.2018 01:00:00 2       3  # results = a + df$b @ 01.01.2018 00:00:00
3 01.01.2018 02:00:00 3       4  # results = a + df$b @ 01.01.2018 00:00:00
4 02.01.2018 00:00:00 1       6  # results = a + df$b @ 02.01.2018 00:00:00
5 02.01.2018 01:00:00 2       7  # results = a + df$b @ 02.01.2018 00:00:00
6 02.01.2018 02:00:00 3       8  # results = a + df$b @ 02.01.2018 00:00:00
7 03.01.2018 00:00:00 1      11  # results = a + df$b @ 03.01.2018 00:00:00
8 03.01.2018 01:00:00 2      12  # results = a + df$b @ 03.01.2018 00:00:00
9 03.01.2018 02:00:00 3      13  # results = a + df$b @ 03.01.2018 00:00:00

3 个答案:

答案 0 :(得分:2)

您可以使用合并而不是for循环来执行此操作,这将更快。在下面的答案中,我还使用了data.table,这是一个快速版本的data.frames,在处理大型表时非常有用。

# install.packages("data.table")  # Uncomment if necessary
library(data.table)

df1 <- data.frame(days,time,a)  # You don't need to create the result column yet
df2 <- data.frame(days,time,b)

df1 <- data.table(df1)
df2 <- data.table(df2)

# Merge the two tables on the days column
df3 <- merge(df1, df2[time=="00:00:00"], by="days")

# This is your result
answer <- df3[, .(days, time=time.x, a, results=a+b)]

输出:

> answer
         days     time a results
1: 01.01.2018 00:00:00 1       2
2: 01.01.2018 01:00:00 2       3
3: 01.01.2018 02:00:00 3       4
4: 02.01.2018 00:00:00 1       6
5: 02.01.2018 01:00:00 2       7
6: 02.01.2018 02:00:00 3       8
7: 03.01.2018 00:00:00 1      11
8: 03.01.2018 01:00:00 2      12
9: 03.01.2018 02:00:00 3      13

答案 1 :(得分:2)

public class TerrainGeneration : MonoBehaviour {

    [SerializeField]
    private Transform groundTile;
    private Vector3 row;
    private int max = 10;

    // Use this for initialization
    void Start () {

        for ( int i = 0; i <= max; i++)
        {
            for (int x = 0; x <= max; x++) {

                row = new Vector3(i, 0, x);
                Instantiate(groundTile, row, Quaternion.identity);

            }    
        }
    }
}


public class MouseEvents : MonoBehaviour {

    private Color isTargeted;
    private Color notTargeted;
    private MeshRenderer groundTileMeshRenderer;
    private Vector3 mousePosition;
    private float mouseX;
    private float mouseY;

    void Start () {

        groundTileMeshRenderer = gameObject.GetComponent<MeshRenderer>();
        isTargeted = Color.cyan;
        notTargeted = groundTileMeshRenderer.material.color;

    }

    void Update()
    {
        mouseX = Mathf.RoundToInt(Input.GetAxis("Mouse X"));
        mouseY = Mathf.RoundToInt(Input.GetAxis("Mouse Y"));
        mousePosition = new Vector3(mouseX, 0, mouseY);

        if (Physics.CheckSphere(mousePosition, 1))
        {
            **//Get the specific gameObject located at the current mouse position
            //Set the gameObject as the target for the color change**
        }
    }

    void OnMouseOver()
    {
        groundTileMeshRenderer.material.color = isTargeted;
    }

    void OnMouseExit()
    {
        groundTileMeshRenderer.material.color = notTargeted;
    }
}

有一点需要注意。这假定df2中的时间按时间顺序排列,并且任何给定日期的第一个值是时间 transform(merge(df1,aggregate(b~days,df2,function(x)x[1])),results=a+b) days time a results b 1 01.01.2018 00:00:00 1 2 1 2 01.01.2018 01:00:00 2 3 1 3 01.01.2018 02:00:00 3 4 1 4 02.01.2018 00:00:00 1 6 5 5 02.01.2018 01:00:00 2 7 5 6 02.01.2018 02:00:00 3 8 5 7 03.01.2018 00:00:00 1 11 10 8 03.01.2018 01:00:00 2 12 10 9 03.01.2018 02:00:00 3 13 10

答案 2 :(得分:2)

使用dplyr的一个解决方案如下所示。解决方案的方法是: 1)filter以外00:00:00以外的所有时间df2 2)然后在inner_joindf1 df2days。这样就可以从b中选择df2到合并数据框中每个匹配day的值。最后添加ab以查找result

df1 <- data.frame(days,time,a,results, stringsAsFactors = FALSE)
df2 <- data.frame(days,time,b, stringsAsFactors = FALSE)

library(dplyr)

df2 %>%
  filter(time == "00:00:00") %>%
  inner_join(df1, by="days") %>%
  mutate(time = time.y, results = a+b) %>%
  select( days, time, a, b, results)

 #Result:
        days     time a  b results
1 01.01.2018 00:00:00 1  1       2
2 01.01.2018 01:00:00 2  1       3
3 01.01.2018 02:00:00 3  1       4
4 02.01.2018 00:00:00 1  5       6
5 02.01.2018 01:00:00 2  5       7
6 02.01.2018 02:00:00 3  5       8
7 03.01.2018 00:00:00 1 10      11
8 03.01.2018 01:00:00 2 10      12
9 03.01.2018 02:00:00 3 10      13