通过R中的表循环

时间:2017-07-10 15:55:32

标签: r loops dataframe indexing import-from-csv

我正在尝试在 R 中编写程序,该程序从.csv文件创建一个表,该表将是1856 x 9项。这部分有效。然后,我想循环遍历该表的每个单元格,从表格的右上角开始,然后向下行,然后下拉到下一行并执行相同的操作。

如果行全部为零,或者有1 1 1 0 0 0或类似的行,我想删除它。如果该行具有所有非零值,然后右侧为零值,则将其删除

如果在具有零值的单元格右侧的单元格中存在非零值,我希望将该行保留在表格中。

示例:

Beginning of my table

我的代码运行后,我只想要保留第1,2,3,7行。

2 个答案:

答案 0 :(得分:2)

您可以使用apply而不是循环:

# recreate your example
DF <- 
read.csv(
text="Company.Name,Seed,Series.A,Series.B,Series.C,Series.D,Series.E,Series.F,Series.G,Series.H
Aetion,0,1,0,0,0,0,0,0,0
Aspier Healt,1,0,1,0,0,0,0,0,0
Evariant,0,1,1,2,0,0,0,0,0
iHealth,0,0,0,0,0,0,0,0,0
Inuition Robotics,0,0,0,0,0,0,0,0,0
Kali Care,0,0,0,0,0,0,0,0,0
Network Locum,0,0,1,0,0,0,0,0,0
"
)

# This line does :
# - for each row of DF excluding the first column DF[,-1]
# - take the row without the last value x[-length(x)] and the 
#   row without the first value x[-1]
# - create a vector with TRUE where x[-length(x)] == 0 AND x[-1] != 0
#   so basically when a zero is followed by a non-zero
# - if there's "any" (see the function) TRUE, then the condition is met
# rowCondition will contain TRUE where the row condition is met, and FALSE otherwise
rowCondition <- apply(DF[,-1],1,function(x) any(x[-length(x)] == 0 & x[-1] != 0))

# we use the condition to filter the necessary rows
subsetDF <- DF[rowCondition,]



> subsetDF
   Company.Name Seed Series.A Series.B Series.C Series.D Series.E Series.F Series.G Series.H
1        Aetion    0        1        0        0        0        0        0        0        0
2  Aspier Healt    1        0        1        0        0        0        0        0        0
3      Evariant    0        1        1        2        0        0        0        0        0
7 Network Locum    0        0        1        0        0        0        0        0        0

答案 1 :(得分:1)

当你正在寻找有0后跟非零字符的任何行时,可以使用正则表达式来执行此操作。 grepl函数根据指定的pattern是否匹配返回TRUE / FALSE向量:

examples <- c("100", "000", "001")
grepl(pattern = "0[1-9]", x = examples)
## [1] FALSE FALSE  TRUE

这个正则表达式明确地在零之后查找数字1-9,你想要除了零之外的任何可能的字符pattern = "0[^0]"

使用通过调用dplyr加载的library("tidyverse")库,可以非常简单地连接感兴趣的列,然后将我们的正则表达式应用于此新列。

首先,将以下内容另存为.csv

Company.Name,种子,Series.A,Series.B,Series.C,Series.D,Series.E,Series.F,Series.G,Series.H Aetion,0,1,0,0,0,0,0,0,0 Aspier Healt,1,0,1,0,0,0,0,0,0 Evariant,0,1,1,2,0,0,0,0,0 iHealth,0,0,0,0,0,0,0,0,0 Inuition Robotics,0,0,0,0,0,0,0,0,0 Kali Care,0,0,0,0,0,0,0,0,0 Network Locum,0,0,1,0,0,0,0,0,0 Martin Company,0,0,0,0,0,0,0,0,1 其他公司,1,1,1,2,1,3,6,7,9 奇怪的公司,0,0,0,0,m,0,0,0,0

然后使用read_csv导入数据:

library("tidyverse")
example_data <- read_csv("example_data.csv")

现在让我们创建一个新列,其中包含行的串联种子:Series.H

example_data <- example_data %>%
  mutate(test_col = paste0(Seed,
                           Series.A,
                           Series.B,
                           Series.C,
                           Series.D,
                           Series.E,
                           Series.F,
                           Series.G,
                           Series.H))

让我们看一下第一行的新列值:

example_data %>%
  select(test_col) %>%
  slice(1)
## 010000000

好!所以在零的右边有一个非零字符!所以这一行应该包含在输出中。

我们可以使用mutate动词在名为include的新列中的所有行中应用grepl测试。让我们打印出整个列,看看哪些行符合您的条件:

example_data %>%
    mutate(include = grepl("0[1-9]", test_col)) %>%
    select(include)
## output
# A tibble: 10 x 1
   include
     <lgl>
 1    TRUE
 2    TRUE
 3    TRUE
 4   FALSE
 5   FALSE
 6   FALSE
 7    TRUE
 8    TRUE
 9   FALSE
10   FALSE

要仅过滤条件为true的那些行,我们使用filter动词:

example_data %>%
  mutate(include = grepl("0[1-9]", test_col)) %>%
  filter(include)

当然,我们现在在您不想要的数据中有两列!所以让我们简明扼要地写下这些:

example_data %>%
  mutate(test_col = paste0(Seed,
                           Series.A,
                           Series.B,
                           Series.C,
                           Series.D,
                           Series.E,
                           Series.F,
                           Series.G,
                           Series.H),
         include = grepl("0[1-9]", test_col)) %>%
  filter(include) %>%
  select(-include, -test_col)