我有一个大型数据集,其中包含一列文本,两万行。想要删除该特定列中每行开头的前x个字符(例如3个)。感谢您的协助。
答案 0 :(得分:2)
您可以使用gsub
函数和简单的正则表达式来做到这一点。这是代码:
# Fake data frame
df <- data.frame(text_col = c("abcd", "abcde", "abcdef"))
df$text_col <- as.character(df$text_col)
# Replace first 3 chracters with empty string ""
df$text_col <- gsub("^.{0,3}", "", df$text_col)
答案 1 :(得分:2)
像往常一样。在R中做事的方法很多!
您也可以尝试?substring
:
lotsofdata <- data.frame(column.1=c("DataPoint1", "DataPoint2", "DataPoint3", "DataPoint4"),
+ column2=c("MoreData1","MoreData2","MoreData3", "MoreData4"),
+ stringsAsFactors=FALSE)
> head(lotsofdata)
column.1 column2
1 DataPoint1 MoreData1
2 DataPoint2 MoreData2
3 DataPoint3 MoreData3
4 DataPoint4 MoreData4
> substring(lotsofdata[,2],4,nchar(lotsofdata[,2]))
[1] "eData1" "eData2" "eData3" "eData4"
或第1列[,1]
> substring(lotsofdata[,1],4,nchar(lotsofdata[,1]))
[1] "aPoint1" "aPoint2" "aPoint3" "aPoint4"
然后将其替换:
x<-substring(lotsofdata[,1],4,nchar(lotsofdata[,1]))
lotsofdata$column.1<-x
> head(lotsofdata)
column.1 column2
1 aPoint1 MoreData1
2 aPoint2 MoreData2
3 aPoint3 MoreData3
4 aPoint4 MoreData4
答案 2 :(得分:1)
使用tidyverse
,我们可以使用str_sub
(以及一些示例fruit
文本字符串)通过直接指定起点和终点来进行此操作:
library(tidyverse)
tbl <- tibble(some_fruit = fruit)
tbl
#> # A tibble: 80 x 1
#> some_fruit
#> <chr>
#> 1 apple
#> 2 apricot
#> 3 avocado
#> 4 banana
#> 5 bell pepper
#> 6 bilberry
#> 7 blackberry
#> 8 blackcurrant
#> 9 blood orange
#> 10 blueberry
#> # … with 70 more rows
tbl %>%
mutate(chopped_fruit = str_sub(fruit, 4, -1))
#> # A tibble: 80 x 2
#> some_fruit chopped_fruit
#> <chr> <chr>
#> 1 apple le
#> 2 apricot icot
#> 3 avocado cado
#> 4 banana ana
#> 5 bell pepper l pepper
#> 6 bilberry berry
#> 7 blackberry ckberry
#> 8 blackcurrant ckcurrant
#> 9 blood orange od orange
#> 10 blueberry eberry
#> # … with 70 more rows
由reprex package(v0.2.1)于2019-02-22创建