字符向量x
包含有关从源到目的地城市及其票价的航班的推文。它看起来如下:
x <- c('RT @airfarewatchdog: Los Angeles Los Angeles LAX to Cabo #SJD for $234',
'RT @TheFlightDeal: Airfare Deal: [AA] New York - Mexico City, Mexico. $270',
'SOME JUNK HERE',
'RT @airfarewatchdog: Los Angeles Los Angeles LAX to New York')
我基本上试图从每一行提取源和目的地城市并将其存储到另一个变量中。
我的代码如下所示:
toMatch <- (data$City_Airport)
a <- sapply(1:length(x), function(i) {
res <- c(i, paste(ex_dollar(x)), unlist(stringr::str_extract_all(x[i], paste(toMatch, collapse = "|"))))
if (length(res) > 1 ) {res
} else NULL
})
a <- plyr::ldply(a, rbind)
a[] <- lapply(a, as.character)
a[is.na(a)] <- ""
names(a)[1] <- "row"
我的输出如下所示:
row 2 3 4 5 6 7 8 9
1 1 $234 $270 NA NA Los Angeles Los Angeles LAX SJD
2 2 $234 $270 NA NA New York Mexico City
3 3 $234 $270 NA NA SOM JUN HER
4 4 $234 $270 NA NA Los Angeles Los Angeles LAX New York
这里发生的事情是从所有行中提取票价并将它们全部粘贴在每一行
我假设这里的问题是使用了循环内部的paste(ex_dollar(x))函数。我试图在其他地方坚持这个功能,但它不会起作用。
我希望我的输出看起来像下面这样:
row 2 3 4 5 6
1 1 $234 Los Angeles Los Angeles LAX SJD
2 2 $270 New York Mexico City
3 3 NA SOM JUN HER
4 4 NA Los Angeles Los Angeles LAX New York
答案 0 :(得分:2)
提取成本的一种方法是使用正则表达式。
使用您的数据:
x <- data.frame(text = c("RT @airfarewatchdog: Los Angeles Los Angeles LAX to Cabo #SJD for $234",
"RT @TheFlightDeal: Airfare Deal: [AA] New York - Mexico City, Mexico. $270",
"SOME JUNK HERE",
"RT @airfarewatchdog: Los Angeles Los Angeles LAX to New York"))
方法是:
x$value = sapply(x,FUN = function(i){regmatches(i,gregexpr("\\$\\d+",i))})
此正则表达式将匹配$后跟数字。如果您有小数,则使用"\\$[0-9.]+"
结果:
text value
1 RT @airfarewatchdog: Los Angeles Los Angeles LAX to Cabo #SJD for $234 $234
2 RT @TheFlightDeal: Airfare Deal: [AA] New York - Mexico City, Mexico. $270 $270
3 SOME JUNK HERE
4 RT @airfarewatchdog: Los Angeles Los Angeles LAX to New York
答案 1 :(得分:2)
假设您已经有一个函数ex_dollar()
从字符串中提取美元值(您的代码调用ex_dollar()
,虽然您没有提供其代码),但只需使用{{1在循环内逐行排列,而不是整个文本:即使用ex_dollar()
而不是ex_dollar(x[i])
ex_dollar(x)
答案 2 :(得分:1)
以下是一个名为df的数据框的方法:
# extract dollars columns as a matrix
myMat <- as.matrix(df[, 2:5])
# pull off diagonal (the data you want)
myDollars <- diag(myMat)
# construct new data.frame
dfNew <- cbind(df[, -(2:5)], myDollars)
返回数据帧
# set names of columns and print result
setNames(dfNew, c("row", 2:5, "myDollars"))
row 2 3 4 5 myDollars
1 1 Los_Angeles Los_Angeles LAX SJD $234
2 2 New_York Mexico_City <NA> <NA> $270
3 3 SOM JUN HER <NA> <NA>
4 4 Los_Angeles Los_Angeles LAX New_York <NA>