Question

我的数据看起来像这样：

13  EDHEC Business School
14  Columbia U and IZA
15  Yale U and Abdul Latif Jameel Poverty Action Lab
16  Carnegie Mellon U
17  Columbia U

如您所见，有些条目包含“多个”实体，我不希望那样。由于split_rows函数无法处理由多个符号组成的定界符（或因此我收集），我计划使用gsub函数将所有“ and”实例转换为字母“ö”（该字母不太可能自然出现在材料）。然后，我将能够在分隔符函数中使用“ö”作为分隔符。

我首先输入：

distinctAF <- gsub("and", "ö", distinctAF)

这似乎可行，但已将我的数据框变成了字符向量。我尝试通过as.data.frame-function将其改回，但无济于事：

distinctAF <- as.data.frame(distinctAF)

distinctAF

1   c("MIT", "NBER", "U MI", "Cornell U", "U VA", "Harvard....

第一步，我尝试将向量转换为矩阵，但这似乎也不起作用：

distinctAF <- matrix(distinctAF, ncol = 1, byrow = TRUE)

我还尝试将字符向量与具有相同长度的数值向量绑定在一起，以期产生一个矩阵。奇怪的是，这创建了一个矩阵，在数字矢量中每个数字都有一个字符矢量副本。

如何将字符向量转换回数据帧（每行一个值），以便我可以按预期分隔行？

我觉得我已经尝试了一切，这不应该那么难^^

链接到文件：

https://www.dropbox.com/s/d4z58w6xvmkyepy/affiliations.csv?dl=0

Answer 1

也许使用stringr会有所帮助。

require(data.table) # I prefer data.table to data.frame
require(stringr) # Used for string ops

# Read the data
data <- fread("affiliations.csv", skip = 1)
colnames(data) <- c("id", "aff")

# Replace `and`s with `ö`s
data[, mod_aff := str_replace_all(aff, " and ", " ö ")]

# Check if worked
head(data[str_detect(mod_aff, "ö")])
# id                                              aff                                        mod_aff
# 1: 14                               Columbia U and IZA                               Columbia U ö IZA
# 2: 15 Yale U and Abdul Latif Jameel Poverty Action Lab Yale U ö Abdul Latif Jameel Poverty Action Lab
# 3: 21                            ETH Zurich and CESifo                            ETH Zurich ö CESifo
# 4: 22                          U Copenhagen and CESifo                          U Copenhagen ö CESifo
# 5: 26                                U Chicago and IZA                                U Chicago ö IZA
# 6: 28                              Bocconi U and IGIER                              Bocconi U ö IGIER

如何将字符向量转换为数据帧

1 个答案: