如何基于正则表达式在dplyr中分隔列

时间:2017-09-11 03:05:12

标签: r regex dplyr tidyverse

我有以下数据框:

df <- structure(list(X2 = c("BB_137.HVMSC", "BB_138.combined.HVMSC", 
"BB_139.combined.HVMSC", "BB_140.combined.HVMSC", "BB_141.HVMSC", 
"BB_142.combined.HMSC-bm")), .Names = "X2", row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

看起来像这样

> df
# A tibble: 6 x 1
                       X2
                    <chr>
1            BB_137.HVMSC
2   BB_138.combined.HVMSC
3   BB_139.combined.HVMSC
4   BB_140.combined.HVMSC
5            BB_141.HVMSC
6 BB_142.combined.HMSC-bm

我想做的是将最后一个字段保留为第二列,分成两列(.作为分隔符)

              col1 col2
            BB_137 HVMSC
   BB_138.combined HVMSC
   BB_139.combined HVMSC
   BB_140.combined HVMSC
            BB_141 HVMSC
   BB_142.combined HMSC-bm

做正确的方法是什么?

我的尝试是这样的:

> df %>% separate(X2, into = c("sid","status", "tiss"), sep = "[.]") 
# A tibble: 6 x 3
     sid   status    tiss
*  <chr>    <chr>   <chr>
1 BB_137    HVMSC    <NA>
2 BB_138 combined   HVMSC
3 BB_139 combined   HVMSC
4 BB_140 combined   HVMSC
5 BB_141    HVMSC    <NA>
6 BB_142 combined HMSC-bm
  

警告讯息:       2个位置的值太少:1,5

2 个答案:

答案 0 :(得分:10)

我们可以在单独的函数中使用负向前瞻作为分隔符。

Private Sub Vérification1_Click() 
maxu57=Sheets("PoteauxW").Range("U57")
maxrow=0
For i = 4 To 1295
  Sheets("PoteauxW").Range("C57").Value =     Sheets("Efforts poteauxW").Cells(i, 6) 
  Sheets("PoteauxW").Range("D57").Value =    Sheets("Efforts poteauxW").Cells(i, 13) 
  Sheets("PoteauxW").Range("E57").Value = Sheets("Efforts poteauxW").Cells(i, 8) 
  If maxu57<Sheets("PoteauxW").Range("U57").Value Then
    maxu57=Sheets("PoteauxW").Range("U57").Value
    maxrow=i
  End If
  If Sheets("PoteauxW").Range("H57") = "NG" Then 
    MsgBox "NG on " & i & " Max U57=" & maxu57 & " found for row="&i
    Exit Sub 
  End If 
Next i 
MsgBox "OK" & " Max U57=" & maxu57 & " found for row="&i
End Sub

正则表达式来自this回答。

答案 1 :(得分:1)

我们也可以使用tidyr :: extract()

 select t, to_seconds(t), max(t), to_seconds(max(t)), max(to_seconds(t)) from Foo group by t;
+---------------------+---------------+---------------------+--------------------+--------------------+
| t                   | to_seconds(t) | max(t)              | to_seconds(max(t)) | max(to_seconds(t)) |
+---------------------+---------------+---------------------+--------------------+--------------------+
| 2020-01-01 00:00:01 |    2147483647 | 2020-01-01 00:00:01 |        63745056001 |        63745056001 |
| 2020-01-02 00:00:01 |    2147483647 | 2020-01-02 00:00:01 |        63745142401 |        63745142401 |
| 2020-01-03 00:00:01 |    2147483647 | 2020-01-03 00:00:01 |        63745228801 |        63745228801 |
+---------------------+---------------+---------------------+--------------------+--------------------+