从列表中的每个项目和其他问题中提取第二个元素

时间:2014-03-16 20:01:47

标签: r

以下是我要做的事情: 这会引发更多的光,但这就是我想要的。假设您有一个类似下面的数据 -

Region      Open    Store
120..141       +    France
145..2115      +    Germany
3322..5643     +    Wales
5646..7451     -    Scotland
7454..8641     -    Mexico
8655..9860     -    India
9980..11413    +    Zambia
11478..1261    -    Nicaragua
12978..1318    +    Sweeden

我想要做的是选择找到第二个元素(141)和连续的第一个元素(145)之间的差异,如果它们符合某个值并且它们具有相同的符号(+或 - ),则组商店在一起。输出示例

期望的输出应该是(如果数字差异小于40且商店标志相同(具有相同的+或 - )

 4 (France and Germany)
 3,14 (Scotland and Mexico and india)

2 个答案:

答案 0 :(得分:0)

"第二元素的载体" ("字符" vector是sapply( strsplit(dat$Region, "\\.\\.") , "[", 2)和"第一个元素" sapply( strsplit(dat$Region, "\\.\\.") , "[", 1)。据推测,第一个这样的差异(在data.frame使第一列成为默认因子类的情况下)是:

 as.numeric(sapply( strsplit(as.character(dat$Region), "\\.\\.") , "[", 1)[2]) - 
       as.numeric(sapply( strsplit(as.character(dat$Region), "\\.\\.") , "[", 1)[1])
#[1] 25

[注意:需要" \。\。"作为'分裂'争论来自于'分裂'参数被解释为正则表达式。]所有差异的向量:

as.numeric(sapply( strsplit(as.character(dat$Region), "\\.\\.") , "[", 1)[-1]) - 
as.numeric(sapply( strsplit(as.character(dat$Region), "\\.\\.") , "[", 1)[-length(dat$Region)])
[1]   25 3177 2324 1808 1201 1325 1498 1500

你的其余问题(和编辑)未能传达你的意图(可能是基于缺乏共享的自然语言。)(我不知道短语"他们所有的商店和#34 ;和"商店标志"可能意味着。)请努力用惯用语进行交流。

答案 1 :(得分:0)

这适用于你提供的数据 - 我是R的noobie,如果它很乱,那就很抱歉。

# Split the string in the first column to make it easier to compare
library(stringr)
regionl<-str_split_fixed(data$Region,c("[..]"),3)[,1]
regionr<-str_split_fixed(data$Region,c("[..]"),3)[,3]

data$regionl <- regionl
data$regionr <-regionr

# We set a threshold for comparison
threshold = 100

# Lets loop through the data and check the right column with the left column 
# We see if it is less than the threshold and has the same sign
# We add the groups up until there is a discrepancy and we print

currentGroup = NULL

for(i in 1:(nrow(data)-1))
{

  # Boolean variables checking against signs and thresholds

  difference <- abs(as.numeric(data$regionr[i])-as.numeric(data$regionl[i+1])) <= threshold
  signs <- (data$Open[i] ==  data$Open[i+1])

  # Group things together
  if(difference & signs)
  {
    currentGroup <- c(currentGroup,as.character(data$Store[i]),as.character(data$Store[i+1]))
  }
  else
  {
    # If it's in a group alone, do not print
    if(is.null(currentGroup))
    {
      # Do nothing
    }else
    {
      # Print groups
      print(unique(currentGroup))
    }
    # Reset the group holder
    currentGroup<-NULL
  }
}