我正在尝试使用gsub()来清理csv格式的文本数据集。现在,我的数据样本行如下:
"5.0\t/gp/customer-reviews/R3M62HO4M6LXE6?ASIN=0439023521\tEngaging. Brutal but engaging!\t\"Wow. I was barely able to put this book down for a second after the first few pages got me completely hooked.
我想删除没有提供任何内容的开头字符串,并删除所有\ t \或\ t,以便获得预期的结果,如
"Engaging. Brutal but engaging!"Wow. I was barely able to put this book down for a second after the first few pages got me completely hooked.
我尝试使用
gsub('\\t\\', "", comment, fix=TRUE)
删除\ t \但它没有用。 并且开头的字符串太复杂了我在编写正确的模式表达式时遇到了麻烦。
答案 0 :(得分:3)
我们可以尝试
SELECT *
FROM (
SELECT rank() OVER (ORDER BY x) AS dr, x
FROM (
SELECT
trunc(random()*1000) AS x
FROM generate_series(1,100)
) AS t
) AS t
WHERE dr BETWEEN 80-10 AND 80+10;
dr | x
----+-----
70 | 702
71 | 706
72 | 718
73 | 734
74 | 751
75 | 756
76 | 774
77 | 778
78 | 805
79 | 813
80 | 829
81 | 833
82 | 839
83 | 852
84 | 853
85 | 872
86 | 884
86 | 884
88 | 892
89 | 897
90 | 905
(21 rows)
答案 1 :(得分:1)
如果您想使用stringr
库:
library(stringr)
str_replace(val,".*\\t(?=[:alnum:])","")
使用 gsub :
gsub(".*\\t(?=[a-zA-Z0-9])", "", val,perl=T)
或 gsub(".*\\t(?=[[:alnum:]])", "", val,perl=T)
<强>输出强>:
> str_replace(val,".*\\t(?=[:alnum:])","")
[1] "Engaging. Brutal but engaging!\t\"Wow. I was barely able to put this book down for a second after the first few pages got me completely hooked."