我试图将字符串分为三个部分:名称和时间(日期,时间)和通用文本。最初看起来像:
data =
c("JENNIFER [Day 1, 9:00 A.M.]: Generic text, it doesn't matter what is going on here. There are more than 2 lines."
"SAM [Day 2, 10:15 A.M.]: This doesn't matter. It has a lot of lines."
"DAN'S [Day 4, 12:00 P.M.]: It doesn't really matter what's going on in this part.")
我能够提取数据的第一部分,NAME [TIME] :,但是我很难做到的是将NAME和TIME分开。
match = regexpr("^[A-Z].*:", data)
regmatches(data, match)
这给了我
JENNIFER [Day 1, 9:00 A.M.]:
SAM [Day 2, 10:15 A.M.]:
DAN'S [Day 4, 12:00 P.M.]:
我可以看到名字全都用大写字母表示,所以我会说"^[A-Z]"
,但这也会用大写字母开头的所有其他句子。
我要创建一个数据框:
Name Date Content
JENNIFER Day 1 9:00A.M "combined text"
答案 0 :(得分:2)
修复{
"results":{
"ALL":{
"currencyName":"Albanian Lek",
"currencySymbol":"Lek",
"id":"ALL"
},
"XCD":{
"currencyName":"East Caribbean Dollar",
"currencySymbol":"$",
"id":"XCD"
},
"EUR":{
"currencyName":"Euro",
"currencySymbol":"€",
"id":"EUR"
},
"BBD":{
"currencyName":"Barbadian Dollar",
"currencySymbol":"$",
"id":"BBD"
},
"BTN":{
"currencyName":"Bhutanese Ngultrum",
"id":"BTN"
},
"BND":{
"currencyName":"Brunei Dollar",
"currencySymbol":"$",
"id":"BND"
},
"XAF":{
"currencyName":"Central African CFA Franc",
"id":"XAF"
},
"CUP":{
"currencyName":"Cuban Peso",
"currencySymbol":"$",
"id":"CUP"
},
"USD":{
"currencyName":"United States Dollar",
"currencySymbol":"$",
"id":"USD"
}
}
}
以使其成为正确的R代码,如末尾的注释所示,我们可以像这样从基数R使用data
:
strcapture
给予:
strcapture("^(.*) \\[(.*)\\]: (.*)", data,
list(Name = character(0), Date = character(0), Text = character(0)))
Name Date Text
1 JENNIFER Day 1, 9:00 A.M. Blablablablablablbalbllalbalbalbl. Balalalbablablabl.
2 SAM Day 2, 10:15 A.M. Balblablablabalbalbalblabalblablabl. Balaldfkemfeke.
3 DAN'S Day 4, 12:00 P.M. DFnerke"dfsdf"