字符串在特定上下文中分割为R

时间:2015-09-23 10:53:57

标签: r

我需要使用 RefSeq之前发生的_拆分列名NM,而不拆分NM之间的部分数字。 我需要将输出放在输入的新列中。

尝试过类似的事情:

strsplit(as.character(TargetScan$RefSeq),"_")

数据

> head(TargetScan)
  Gene         miRNA    Site cont.score cont.score.perc
1 A1CF hsa-let-7a-5p 8mer-1a     -0.051              12
2 A1CF hsa-let-7b-5p 8mer-1a     -0.051              12
3 A1CF hsa-let-7c-5p 8mer-1a     -0.051              12
4 A1CF hsa-let-7d-5p 8mer-1a     -0.062              12
5 A1CF hsa-let-7e-5p 8mer-1a     -0.051              12
6 A1CF hsa-let-7f-5p 8mer-1a     -0.051              12
                                                                RefSeq
1 NM_001198820_NM_014576_NM_138932_NM_001198819_NM_001198818_NM_138933
2 NM_001198820_NM_014576_NM_138932_NM_001198819_NM_001198818_NM_138933
3 NM_001198820_NM_014576_NM_138932_NM_001198819_NM_001198818_NM_138933
4 NM_001198820_NM_014576_NM_138932_NM_001198819_NM_001198818_NM_138933
5 NM_001198820_NM_014576_NM_138932_NM_001198819_NM_001198818_NM_138933
6 NM_001198820_NM_014576_NM_138932_NM_001198819_NM_001198818_NM_138933

> head(TargetScan)
  Gene         miRNA    Site cont.score cont.score.perc
1 A1CF hsa-let-7a-5p 8mer-1a     -0.051              12
2 A1CF hsa-let-7b-5p 8mer-1a     -0.051              12
3 A1CF hsa-let-7c-5p 8mer-1a     -0.051              12
4 A1CF hsa-let-7d-5p 8mer-1a     -0.062              12
5 A1CF hsa-let-7e-5p 8mer-1a     -0.051              12
6 A1CF hsa-let-7f-5p 8mer-1a     -0.051              12
  new1         new2      new3      new4          new5         new6                        
1 NM_001198820 NM_014576 NM_138932 NM_001198819 NM_001198818 NM_138933
2 NM_001198820 NM_014576 NM_138932 NM_001198819 NM_001198818 NM_138933
3 NM_001198820 NM_014576 NM_138932 NM_001198819 NM_001198818 NM_138933
4 NM_001198820 NM_014576 NM_138932 NM_001198819 NM_001198818 NM_138933
5 NM_001198820 NM_014576 NM_138932 NM_001198819 NM_001198818 NM_138933
6 NM_001198820 NM_014576 NM_138932 NM_001198819 NM_001198818 NM_138933

3 个答案:

答案 0 :(得分:3)

strsplit(x, "(?<=\\d)_", perl=T)[[1]]
#[1] "NM_001198820" "NM_014576"    "NM_138932"    "NM_001198819"
#[5] "NM_001198818" "NM_138933"  

这种方法使用了后视。遵循字符串模式"(?<=\\d)_",我们匹配前面带有数字的下划线。

包含在所需输出的函数中:

library(tidyr)
separate(TargetScan, RefSeq, paste0("new", 1:6), "(?<=\\d)_")
#   Gene         miRNA    Site cont.score cont.score.perc         new1      new2
# 1 A1CF hsa-let-7a-5p 8mer-1a     -0.051              12 NM_001198820 NM_014576
# 2 A1CF hsa-let-7b-5p 8mer-1a     -0.051              12 NM_001198820 NM_014576
# 3 A1CF hsa-let-7c-5p 8mer-1a     -0.051              12 NM_001198820 NM_014576
# 4 A1CF hsa-let-7d-5p 8mer-1a     -0.062              12 NM_001198820 NM_014576
# 5 A1CF hsa-let-7e-5p 8mer-1a     -0.051              12 NM_001198820 NM_014576
# 6 A1CF hsa-let-7f-5p 8mer-1a     -0.051              12 NM_001198820 NM_014576
#        new3         new4         new5      new6
# 1 NM_138932 NM_001198819 NM_001198818 NM_138933
# 2 NM_138932 NM_001198819 NM_001198818 NM_138933
# 3 NM_138932 NM_001198819 NM_001198818 NM_138933
# 4 NM_138932 NM_001198819 NM_001198818 NM_138933
# 5 NM_138932 NM_001198819 NM_001198818 NM_138933
# 6 NM_138932 NM_001198819 NM_001198818 NM_138933

答案 1 :(得分:0)

我会尝试使用<?php #Connection to the database function dbcon (){ try{ $db = new PDO('mysql:dbname=php_test;host=localhost','root','mysql'); } catch (PDOException $e){ echo $e->getMessage(); exit(); } return $db; } #Sanitize the input for preventing hacking attempts function sanitize($data) { $data = trim($data); $data = stripslashes($data); $data = htmlspecialchars($data); return $data; } #Get the list of countries from the DB function getCountries() { $db = dbcon(); $query = "SELECT country FROM countries"; $stmt = $db->prepare($query); $stmt->execute(); $countries = ""; while ($row = $stmt->fetch()) { $countries .= '<option value= "'.$row['country'].'">'.$row['country'].'</option>'; } return $countries; } $name = $email = $password = $password2 = $country = ""; $validForm = True; #If it's a submission, validate the form if ($_SERVER["REQUEST_METHOD"] == "POST") { $db = dbcon(); #Name validation $name = sanitize($_POST["name"]); if ((strlen($name) < 2) || (strlen($name) > 50)) { echo "<span style=\"color: #FF0000;\"> Name must have between 2 and 50 characters </span> <br>"; $name = ""; $validForm = False; } #Email validation $email = sanitize($_POST["email"]); if (!filter_var($email, FILTER_VALIDATE_EMAIL)) { echo "<span style=\"color: #FF0000;\"> Check the format of the email </span> <br>"; $email = ""; $validForm = False; } else { #If it's a valid email, check whether or not it's already registered $query = "SELECT email FROM users;"; $stmt = $db->prepare($query); $stmt->execute(); $found = False; while (($row = $stmt->fetch()) and (!$found)) { if ($row["email"] == $email) { $found = True; } } if ($found) { echo "<span style=\"color: #FF0000;\"> This email is already registered </span> <br>"; $email = ""; $validForm = False; } } #Password validation $password = sanitize($_POST["pass1"]); if ((strlen($password) < 6) || (strlen($password) > 20)) { echo "<span style=\"color: #FF0000;\"> Password must have between 6 and 20 characters </span> <br>"; $validForm = False; } else { #If it's a valid password, check whether or not both passwords match $password2 = sanitize($_POST["pass2"]); if ($password != $password2) { echo "<span style=\"color: #FF0000;\"> Passwords don't match </span> <br>"; $validForm = False; } #If passwords match, hash the password else { $password = password_hash($password, PASSWORD_DEFAULT); } } #We don't need to validate country because it's retrieved from the DB, but we sanitize it just in case a hacker modified the POST using a proxy $country = sanitize($_POST["country"]); #All checks done, insert into DB and move to success.php if ($validForm) { $query = "INSERT INTO users VALUES(:name, :email, :password, :country);"; $stmt = $db->prepare($query); $stmt->bindParam(':name', $name); $stmt->bindParam(':email', $email); $stmt->bindParam(':password', $password); $stmt->bindParam(':country', $country); $stmt->execute(); header("Location: success.php"); } } ?> <html> <head> </head> <body> <!-- Submitting to this very file --> <form action="<?php echo htmlspecialchars($_SERVER["PHP_SELF"]);?>" method="post"> <table> <tr> <!-- Name --> <td><label for="name">Name:</label></td> <td><input type="text" name="name" value="<?php echo htmlspecialchars($name); ?>" required /></td> <td><span style="color: #FF0000;">*</span></td> <td>Between 2 and 50 characters</td> </tr> <tr> <!-- Email --> <td><label for="email">Email:</label></td> <td><input type="text" name="email" value="<?php echo htmlspecialchars($email); ?>" required/></td> <td><span style="color: #FF0000;">*</span></td> <td>Must be a valid address</td> </tr> <tr> <!-- Password --> <td><label for="pass1">Password:</label></td> <td><input type="password" name="pass1" required/></td> <td><span style="color: #FF0000;">*</span> </td> <td>Between 6 and 20 characters</td> </tr> <tr> <!-- Confirm password --> <td><label for="pass2">Confirm password:</label></td> <td><input type="password" name="pass2" required/></td> <td><span style="color: #FF0000;">*</span></td> <td>Must be the same as the password</td> </tr> <tr> <!-- Country --> <td><label for="country">Country:</label></td> <td><select name="country"> <?php echo getCountries(); ?></select></td> <td><span style="color: #FF0000;">*</span></td> </tr> <tr> <td><input type="submit"></td> </tr> </table> </form> </body> </html> 替换NM之前的下划线,然后在值上调用gsub,如下所示:

strsplit

答案 2 :(得分:0)

使用正则表达式匹配您想要的文字并完成它我建议stringr::str_match_all

library(stringr)
s <- c('NM_001198820_NM_014576_NM_138932_NM_001198819_NM_001198818_NM_138933',
       'NM_001198820_NM_014576_NM_138932_NM_001198819_NM_001198818_NM_138933')
str_match_all(s, '([A-Za-z]{2}_\\d+)_?')

产量

[[1]]
     [,1]            [,2]          
[1,] "NM_001198820_" "NM_001198820"
[2,] "NM_014576_"    "NM_014576"   
[3,] "NM_138932_"    "NM_138932"   
[4,] "NM_001198819_" "NM_001198819"
[5,] "NM_001198818_" "NM_001198818"
[6,] "NM_138933"     "NM_138933"   

[[2]]
     [,1]            [,2]          
[1,] "NM_001198820_" "NM_001198820"
[2,] "NM_014576_"    "NM_014576"   
[3,] "NM_138932_"    "NM_138932"   
[4,] "NM_001198819_" "NM_001198819"
[5,] "NM_001198818_" "NM_001198818"
[6,] "NM_138933"     "NM_138933"   

之后,您可以在data.frame中的返回列表中组织数据。 请注意,第二列包含您想要的信息。