我正在使用perl,我想在字符串中附加一个子字符串。
我有什么。
<users used_id="222" user_base="0" user_name="Mike" city="Chicago"/>
<users used_id="333" user_base="0" user_name="Jim Beans" city="Ann Arbor"/>
我想要的是在这样的用户名后面添加先生先生。
<users used_id="222" user_base="0" user_name="Mike, Mr" city="Chicago"/>
<users used_id="333" user_base="0" user_name="Jim Beans, Mr" city="Ann Arbor"/>
问题是我不知道该怎么做吗?到目前为止,这就是我所拥有的。请没有XML库。
#!/usr/bin/perl
use strict;
use warnings;
print "\nPerl Starting ... \n\n";
while (my $recordLine =<DATA>)
{
chomp($recordLine);
#print "$recordLine ...\n";
if (index($recordLine, "user_name") != -1)
{
#Found user_name tag ... now appeand Mr. after the name at the end ... how?
$recordLine =~ s/user_name=".*?"/user_name=" Mr"/g;
print "recordLine: $recordLine ...\n";
}
}
print "\nPerl End ... \n\n";
__DATA__
<users used_id="222" user_base="0" user_name="Mike" city="Chicago"/>
<users used_id="333" user_base="0" user_name="Jim Beans" city="Ann Arbor"/>
答案 0 :(得分:3)
你快到了。
正则表达式中的library(tidyverse) # purrr, tidyr, and dplyr
library(repurrrsive) # The data comes from this package
got_chars_mutilated <- got_chars
got_chars_mutilated[[1]]["gender"] <- NULL
# original problem
map_dfr(
got_chars_mutilated,
magrittr::extract,
c("name", "culture", "gender", "id", "born", "alive")
)
#> Error: Argument 3 is a list, must contain atomic vectors
# Option 1:
# expanded unnest_*() functions coming soon in tidyr
packageVersion("tidyr")
#> [1] '0.8.99.9000'
# automatic unnesting leads to ... unnest_wider()
tibble(got = got_chars_mutilated) %>%
unnest_auto(got)
#> Using `unnest_wider(got)`; elements have {n_common} names in common
#> # A tibble: 30 x 18
#> url id name culture born died alive titles aliases father mother
#> <chr> <int> <chr> <chr> <chr> <chr> <lgl> <list> <list> <chr> <chr>
#> 1 http… 1022 Theo… Ironbo… In 2… "" TRUE <chr … <chr [… "" ""
#> 2 http… 1052 Tyri… "" In 2… "" TRUE <chr … <chr [… "" ""
#> 3 http… 1074 Vict… Ironbo… In 2… "" TRUE <chr … <chr [… "" ""
#> 4 http… 1109 Will "" "" In 2… FALSE <chr … <chr [… "" ""
#> 5 http… 1166 Areo… Norvos… In 2… "" TRUE <chr … <chr [… "" ""
#> 6 http… 1267 Chett "" At H… In 2… FALSE <chr … <chr [… "" ""
#> 7 http… 1295 Cres… "" In 2… In 2… FALSE <chr … <chr [… "" ""
#> 8 http… 130 Aria… Dornish In 2… "" TRUE <chr … <chr [… "" ""
#> 9 http… 1303 Daen… Valyri… In 2… "" TRUE <chr … <chr [… "" ""
#> 10 http… 1319 Davo… Wester… In 2… "" TRUE <chr … <chr [… "" ""
#> # … with 20 more rows, and 7 more variables: spouse <chr>,
#> # allegiances <list>, books <list>, povBooks <list>, tvSeries <list>,
#> # playedBy <list>, gender <chr>
# let's do it again, calling the proper function, and inspect `gender`
tibble(got = got_chars_mutilated) %>%
unnest_wider(got) %>%
pull(gender)
#> [1] NA "Male" "Male" "Male" "Male" "Male" "Male"
#> [8] "Female" "Female" "Male" "Female" "Male" "Female" "Male"
#> [15] "Male" "Male" "Female" "Female" "Female" "Male" "Male"
#> [22] "Male" "Male" "Male" "Male" "Female" "Male" "Male"
#> [29] "Male" "Female"
# Option 2:
# attack this column-wise
# mapping the names gives access to the `.default` argument for missing elements
c("name", "culture", "gender", "id", "born", "alive") %>%
set_names() %>%
map(~ map(got_chars_mutilated, .x, .default = NA)) %>%
map(simplify) %>%
as_tibble()
#> # A tibble: 30 x 6
#> name culture gender id born alive
#> <chr> <chr> <list> <int> <chr> <lgl>
#> 1 Theon Greyjoy Ironborn <lgl [1… 1022 In 278 AC or 279 AC, at Py… TRUE
#> 2 Tyrion Lannis… "" <chr [1… 1052 In 273 AC, at Casterly Rock TRUE
#> 3 Victarion Gre… Ironborn <chr [1… 1074 In 268 AC or before, at Py… TRUE
#> 4 Will "" <chr [1… 1109 "" FALSE
#> 5 Areo Hotah Norvoshi <chr [1… 1166 In 257 AC or before, at No… TRUE
#> 6 Chett "" <chr [1… 1267 At Hag's Mire FALSE
#> 7 Cressen "" <chr [1… 1295 In 219 AC or 220 AC FALSE
#> 8 Arianne Marte… Dornish <chr [1… 130 In 276 AC, at Sunspear TRUE
#> 9 Daenerys Targ… Valyrian <chr [1… 1303 In 284 AC, at Dragonstone TRUE
#> 10 Davos Seaworth Westeros <chr [1… 1319 In 260 AC or before, at Ki… TRUE
#> # … with 20 more rows
序列代表原始输入中的某些字符,您还必须找到一种在输出中包含这些字符的方法。
这是通过模式中的捕获组(正则表达式的一部分用括号括起来)和对.*?
(意味着第一个捕获组的内容)的引用来完成的。替换模式。
$1