通过逆对子集数据

时间:2017-01-07 04:37:50

标签: r loops subset

以下data.frame应该是逆对和一些条件的子集:

> foo
   ID Day  Period            Start              End
1  11   1 morning     Central Park Alphabet Village
2  11   1 morning     Central Park Alphabet Village
3  11   1 evening Alphabet Village        Grammercy
4  54   1 morning     Union Square        Chinatown
5  67   1 morning          Midtown           Harlem
6  67   1 morning           Harlem          Midtown
7  69   1 morning       Greenpoint Prospect Heights
8  54   1 evening        Chinatown     Union Square
9  77   1 morning       Park Slope     Williamsburg
10 73   1 evening     Williamsburg       Park Slope
11 88   2 morning        Grammercy     Battery Park
12 88   2 morning     Battery Park             SoHo
13 88   2 evening     Battery Park        Grammercy
14 69   2 evening Prospect Heights       Greenpoint
15 88   2 evening        Grammercy     Battery Park

例如,StartEnd电台逆对必须落在 相同的Day,具有相同的ID,而第一个必须在早上发生,第二个发生在晚上。 *编辑:应该注意,只有一个Start-End可用于与End-Start配对。也就是说,一旦形成一对,原始的开始 - 结束就不能再用于形成另一对。例如,记录15无法与记录13配对,因为13已被“占用”。

子集的输出始终为偶数。在这种情况下,它将是:

   ID Day  Period        Start          End
3  54   1 morning Union Square    Chinatown
7  54   1 evening    Chinatown Union Square
10 88   2 morning    Grammercy Battery Park
11 88   2 evening Battery Park    Grammercy

我不确定是否应该使用subset()函数以及for循环或如何构造循环。它应该说 - 如果startend等于以下行的endstartID = ID,{ {1}} = Day和第一条记录的Day =“早晨”,而第二条记录=“晚上”

我认为代码应该以这样的内容开头:Period但不确定。我们的想法是保持满足这些条件的所有逆对。任何指导和解释步骤将不胜感激。

样本数据:

if(foo[i-1,"start"] == foo[i,"end"]) & (foo[i-1,"end"] == foo[i,"start"])

2 个答案:

答案 0 :(得分:2)

按照' ID',' Day',static void Main(string[] args) { try { Method1(); Method2(); Method3(); Console.WriteLine("Success"); } catch (Exception e) { Console.WriteLine("Something wrong happened!"); } Console.ReadLine(); } private static void Method1() { Console.WriteLine("Here is one"); } private static void Method2() { Console.WriteLine("Here is two"); string foo = null; foo.ToUpper(); } private static void Method3() { Console.WriteLine("Here is three"); } '期间'进行分组。 filter元素数大于1(unique)的位置,然后将ndistinct列更改为factor,并执行与{&}中的条件匹配的character #39;帖子

filter

library(dplyr) foo %>% group_by(ID, Day) %>% filter(n_distinct(Period)>1) %>% mutate(Start = as.character(Start), End = as.character(End)) %>% filter(Start[1]==End[n()] & Start[n()] == End[1]) # ID Day Period Start End # (int) (int) (fctr) (chr) (chr) #1 54 1 morning Union Square Chinatown #2 54 1 evening Chinatown Union Square #3 88 2 morning Grammercy Battery Park #4 88 2 evening Battery Park Grammercy 版本0.5.0及更高版本中,我们可以使用dplyr

mutate_if

答案 1 :(得分:0)

在SQL中,您将使用联合查询的自联接。通过拆分早晚子集,然后将它们合并到 ID Day Start End,在基础R中考虑相同的方法(反向配对),最后rbind然后再一起拆分相应的列:

mdf <- setNames(df[df$Period=='morning',], paste0(colnames(df), "_m"))
edf <- setNames(df[df$Period=='evening',], paste0(colnames(df), "_e"))

rbind(setNames(merge(mdf, edf,
                    by.x=c("ID_m", "Day_m", "Start_m", "End_m"), 
                    by.y=c("ID_e", "Day_e", "End_e", "Start_e"))[colnames(mdf)], colnames(df)),
      setNames(merge(mdf, edf,
                     by.x=c("ID_m", "Day_m","Start_m", "End_m"), 
                     by.y=c("ID_e", "Day_e", "End_e", "Start_e"))[c("ID_m", "Day_m", "Period_e", "End_m", "Start_m")], colnames(df)))

#   ID Day  Period        Start          End
# 1 54   1 morning Union Square    Chinatown
# 2 88   2 morning    Grammercy Battery Park
# 3 54   1 evening    Chinatown Union Square
# 4 88   2 evening Battery Park    Grammercy

SQL 对应(在MS Access中工作,返回完全相同的输出)

SELECT t1.*
FROM
   (SELECT m.ID, m.Day, m.Period, m.[Start], m.[End]
    FROM RDataSet AS m
    WHERE (((m.Period)='morning'))) As t1
INNER JOIN
   (SELECT e.ID, e.Day, e.Period, e.[Start], e.[End]
    FROM RDataSet AS e
    WHERE (((e.Period)='evening'))) As t2
ON t1.ID = t2.ID AND t1.Day = t2.Day AND t1.[Start] = t2.[End] AND t1.[End] = t2.[Start]

UNION

SELECT t2.*
FROM
   (SELECT m.ID, m.Day, m.Period, m.[Start], m.[End]
    FROM RDataSet AS m
    WHERE (((m.Period)='morning'))) As t1
INNER JOIN
   (SELECT e.ID, e.Day, e.Period, e.[Start], e.[End]
    FROM RDataSet AS e
    WHERE (((e.Period)='evening'))) As t2
ON t1.ID = t2.ID AND t1.Day = t2.Day AND t1.[Start] = t2.[End] AND t1.[End] = t2.[Start]