Question

我正在尝试在 R 中编写程序，该程序从.csv文件创建一个表，该表将是1856 x 9项。这部分有效。然后，我想循环遍历该表的每个单元格，从表格的右上角开始，然后向下行，然后下拉到下一行并执行相同的操作。

如果行全部为零，或者有1 1 1 0 0 0或类似的行，我想删除它。如果该行具有所有非零值，然后右侧为零值，则将其删除

如果在具有零值的单元格右侧的单元格中存在非零值，我希望将该行保留在表格中。

示例：

我的代码运行后，我只想要保留第1,2,3,7行。

Answer 1

您可以使用apply而不是循环：

# recreate your example
DF <- 
read.csv(
text="Company.Name,Seed,Series.A,Series.B,Series.C,Series.D,Series.E,Series.F,Series.G,Series.H
Aetion,0,1,0,0,0,0,0,0,0
Aspier Healt,1,0,1,0,0,0,0,0,0
Evariant,0,1,1,2,0,0,0,0,0
iHealth,0,0,0,0,0,0,0,0,0
Inuition Robotics,0,0,0,0,0,0,0,0,0
Kali Care,0,0,0,0,0,0,0,0,0
Network Locum,0,0,1,0,0,0,0,0,0
"
)

# This line does :
# - for each row of DF excluding the first column DF[,-1]
# - take the row without the last value x[-length(x)] and the 
#   row without the first value x[-1]
# - create a vector with TRUE where x[-length(x)] == 0 AND x[-1] != 0
#   so basically when a zero is followed by a non-zero
# - if there's "any" (see the function) TRUE, then the condition is met
# rowCondition will contain TRUE where the row condition is met, and FALSE otherwise
rowCondition <- apply(DF[,-1],1,function(x) any(x[-length(x)] == 0 & x[-1] != 0))

# we use the condition to filter the necessary rows
subsetDF <- DF[rowCondition,]



> subsetDF
   Company.Name Seed Series.A Series.B Series.C Series.D Series.E Series.F Series.G Series.H
1        Aetion    0        1        0        0        0        0        0        0        0
2  Aspier Healt    1        0        1        0        0        0        0        0        0
3      Evariant    0        1        1        2        0        0        0        0        0
7 Network Locum    0        0        1        0        0        0        0        0        0

Answer 2

当你正在寻找有0后跟非零字符的任何行时，可以使用正则表达式来执行此操作。 grepl函数根据指定的pattern是否匹配返回TRUE / FALSE向量：

examples <- c("100", "000", "001")
grepl(pattern = "0[1-9]", x = examples)
## [1] FALSE FALSE  TRUE

这个正则表达式明确地在零之后查找数字1-9，你想要除了零之外的任何可能的字符pattern = "0[^0]"

使用通过调用dplyr加载的library("tidyverse")库，可以非常简单地连接感兴趣的列，然后将我们的正则表达式应用于此新列。

首先，将以下内容另存为.csv

Company.Name，种子，Series.A，Series.B，Series.C，Series.D，Series.E，Series.F，Series.G，Series.H Aetion，0,1,0,0,0,0,0,0,0 Aspier Healt，1,0,1,0,0,0,0,0,0 Evariant，0,1,1,2,0,0,0,0,0 iHealth，0,0,0,0,0,0,0,0,0 Inuition Robotics，0,0,0,0,0,0,0,0,0 Kali Care，0,0,0,0,0,0,0,0,0 Network Locum，0,0,1,0,0,0,0,0,0 Martin Company，0,0,0,0,0,0,0,0,1 其他公司，1,1,1,2,1,3,6,7,9 奇怪的公司，0,0,0,0，m，0,0,0,0

然后使用read_csv导入数据：

library("tidyverse")
example_data <- read_csv("example_data.csv")

现在让我们创建一个新列，其中包含行的串联种子：Series.H

example_data <- example_data %>%
  mutate(test_col = paste0(Seed,
                           Series.A,
                           Series.B,
                           Series.C,
                           Series.D,
                           Series.E,
                           Series.F,
                           Series.G,
                           Series.H))

让我们看一下第一行的新列值：

example_data %>%
  select(test_col) %>%
  slice(1)
## 010000000

好！所以在零的右边有一个非零字符！所以这一行应该包含在输出中。

我们可以使用mutate动词在名为include的新列中的所有行中应用grepl测试。让我们打印出整个列，看看哪些行符合您的条件：

example_data %>%
    mutate(include = grepl("0[1-9]", test_col)) %>%
    select(include)
## output
# A tibble: 10 x 1
   include
     <lgl>
 1    TRUE
 2    TRUE
 3    TRUE
 4   FALSE
 5   FALSE
 6   FALSE
 7    TRUE
 8    TRUE
 9   FALSE
10   FALSE

要仅过滤条件为true的那些行，我们使用filter动词：

example_data %>%
  mutate(include = grepl("0[1-9]", test_col)) %>%
  filter(include)

当然，我们现在在您不想要的数据中有两列！所以让我们简明扼要地写下这些：

example_data %>%
  mutate(test_col = paste0(Seed,
                           Series.A,
                           Series.B,
                           Series.C,
                           Series.D,
                           Series.E,
                           Series.F,
                           Series.G,
                           Series.H),
         include = grepl("0[1-9]", test_col)) %>%
  filter(include) %>%
  select(-include, -test_col)

通过R中的表循环

2 个答案: