R嵌套for循环迭代行和列名称

时间:2017-04-09 01:15:48

标签: r for-loop

我是R新手,所以请原谅基本问题。

这是我的数据的.csv a Dropbox link

我有1990年至2010年的国家数据。我的数据很广:每个国家都是一行,每年有两列对应两个数据源。但是,某些国家/地区的数据并不完整。例如,一个国家/地区行可能在1990-1995列中具有NA值。

我想创建两列,对于每个国家/地区行,我希望这些列中的值是两种数据类型中最早的非NA

我还想创建另外两列,对于每个国家/地区行,我希望这些列中的值是两个数据中每个数据的最早非NA 类型。

所以最后四列是这样的:

1990, 12, 1990, 87
1990, 7, 1990, 132
1996, 22, 1996, 173
1994, 14, 1994, 124

这是我粗略的半伪代码尝试,我想象嵌套for循环看起来像:

for i in (number of rows){
  for j in names(df){
    if(is.na(df$j) == FALSE)  df$earliest_year = j
  }
}

如何生成这些所需的四列?谢谢!

1 个答案:

答案 0 :(得分:2)

你提到过循环;所以我试着制作一个for循环。但是你可能想尝试其他的R函数,比如稍后再申请。这段代码有点冗长,希望这对你有所帮助:

# read data; i'm assuming the first column is row name and not important
df <- read.csv("wb_wide.csv", row.names = 1)

# get names of columns for the two datasource
# here I used grep to find columns names using NY and SP pattern; 
# but if the format is consistentto be alternating, 
# you can use sequence of number
dataSourceA <- names(df)[grep(x = names(df), pattern = "NY")]
dataSourceB <- names(df)[grep(x = names(df), pattern = "SP")]

# create new columns for the data set
# if i understand it correctly, first non-NA data from source 1
# and source 2; and then the year of these non-NAs
df$sourceA <- vector(length = nrow(df))
df$yearA <- vector(length = nrow(df))
df$sourceB <- vector(length = nrow(df))
df$yearB <- vector(length = nrow(df))

# start for loop that will iterate per row
for(i in 1:nrow(df)){

  # this is a bit nasty; but the point here is to first select columns for source A
  # then determine non-NAs, after which select the first and store it in the sourceA column
  df$sourceA[i] <- df[i, dataSourceA][which(!is.na(df[i , dataSourceA]))[1]]

  # another nasty one; but I used gsub to clean the column name so that the year will be left
  # you can also skip this and then just clean afterward
  df$yearA[i] <- gsub(x = names(df[i, dataSourceA][which(!is.na(df[i , dataSourceA]))[1]]),
               pattern = "^.*X", replacement = "")

  # same with the first bit of code, but here selecting from source B
  df$sourceB[i] <- df[i, dataSourceB][which(!is.na(df[i , dataSourceB]))[1]]

  # same with the second bit for source B
  df$yearB[i] <- gsub(x = names(df[i, dataSourceB][which(!is.na(df[i , dataSourceB]))[1]]),
               pattern = "^.*X", replacement = "")

}

我尝试使代码特定于您的示例并希望输出。