Question

我面临以下问题。我的表格中有一个名为Person = [{id:1, Name: "John", Language:"EN"}, {id:2, Name: "Susan", Language:"EN"}, {id:3, Name: "Jet", Language:"EN"}]的列。

title列包含值为title的行。

所以基本上列的格式是To kill a mockingbird (1960)。我需要的是两列：[title] ([year])和title，year没有括号。

另一个问题是某些行包含标题，包括括号。但基本上每行的最后6个字符都用括号括起来。

如何创建两列year和title？

我拥有的是：

year

我需要的是：

Books$title <- c("To kill a mockingbird (1960)", "Harry Potter and the order of the phoenix (2003)", "Of mice and men (something something) (1937)")

title
To kill a mockingbird (1960)
Harry Potter and the order of the phoenix (2003)
Of mice and men (something something) (1937)

Answer 1

我们可以解决substr最后6个字符。

首先，我们重新创建您的data.frame：

df <- read.table(h=T, sep="\n", stringsAsFactors = FALSE,
text="
Title
To kill a mockingbird (1960)
Harry Potter and the order of the phoenix (2003)
Of mice and men (something something) (1937)")

然后我们创建一个新的。第一列Title是df$Title的所有内容，但最后7个字符（我们还删除了尾随空格）。第二列Year是df$Title中的最后6个字符，我们删除任何空格，开启或关闭括号。（gsub("[[:punct:]]", ...）本来也可以。

data.frame(Title=substr(df$Title, 1, nchar(df$Title)-7),
           Year=gsub(" |\\(|\\)", "", substr(df$Title, nchar(df$Title)-6, nchar(df$Title))))


                                      Title Year
1                     To kill a mockingbird 1960
2 Harry Potter and the order of the phoenix 2003
3     Of mice and men (something something) 1937

这能解决您的问题吗？

Answer 2

尝试在循环中使用substrRight(df$Title, 6)来提取最后6个字符，以便使用括号将年份保存为新列

Extracting the last n characters from a string in R

Answer 3

与@Vincent Bonhomme类似：

我假设数据存在于某些文本文件中，我将其称为so.dat，我将数据读入data.frame，其中还包含两列用于标题和年份的提取。然后我使用substr()将标题从固定长度年份中分离出来，只留下（），因为OP显然需要它们：

titles      <- data.frame( orig = readLines( "so.dat" ), 
               text = "", yr = "", stringsAsFactors = FALSE )
titles$text <- substring( titles[ , 1 ], 
               1, nchar( titles[ , 1 ] ) - 7 )
titles$yr   <- substring( titles[ , 1 ], 
               nchar( titles[ , 1 ] ) - 5, nchar( titles[ , 1 ] ) )

原始数据可以删除或不删除，这取决于进一步的需要。

R中的分裂列

3 个答案: