如果R不适合这份工作那么公平,但我相信应该是。
我正在调用API,然后将结果转储到Postman json阅读器中。然后我得到的结果如下:
"results": [
{
"personUuid": "***",
"synopsis": {
"fullName": "***",
"headline": "***",
"location": "***",
"image": "***",
"skills": [
"*",
"*",
"*",
"*.",
"*"
],
"phoneNumbers": [
"***",
"***"
],
"emailAddresses": [
"***"
],
"networks": [
{
"name": "linkedin",
"url": "***",
"type": "canonicalUrl",
"lastAccessed": null
},
{
"name": "***",
"url": "***",
"type": "cvUrl",
"lastAccessed": "*"
},
{
"name": "*",
"url": "***",
"type": "cvUrl",
"lastAccessed": "*"
}
]
}
},
{
首先,我不确定如何将其导入R,因为我主要处理csv。我已经看到其他问题,其中人们使用Json包直接调用URL,但这不适用于我正在做的事情,所以我想知道如何用json读取csv。
我用过:
x <- fromJSON(file="Z:/json.csv")
但也许这是更好的方式。一旦完成,json看起来更像:
...$results[[9]]$synopsis$emailAddresses
[1] "***" "***"
[3] "***" "***"
$results[[9]]$synopsis$networks...
然后我想要的每个结果是将标题和电子邮件地址存储到数据表中。
我试过了:
str_extract_all(x, 'emailAddresses*$')
但是,我认为*将代表emailAddresses和$包括新行等之间的所有内容,但这不起作用。当你得到*工作时,我也发现提取物,它不会提取*代表什么。
例如:
> y <- 'some text. email "oli@oli.o" other text'
> y
[1] "some text. email \"oli@oli.o\" other text"
> str_extract_all(y, 'email \"*"')
[[1]]
[1] "email \""
第2部分:
以下答案有效,但是如果我直接打电话给api:
body ='{"start": 0,"count": 105,...}'
x <- POST(url="https://live.*.me/api/v3/person", body=body, add_headers(Accept="application/json", 'Content-Type'="application/json", Authorization = "id=*, apiKey=*"))
y <- content(x)
然后使用
fromJSON(y, flatten=TRUE)$results[c("synopsis.headline",
"synopsis.emailAddresses")]
不起作用。我尝试了以下方法:
z <- NULL
zz <- NULL
for(i in 1:y$count){
z=rbind(z,data.table(job = y$results[[i]]$synopsis$headline))
}
for(i in 1:y$count){
zz=rbind(zz,data.table(job = y$results[[i]]$synopsis$emailAddresses))
}
df <- cbind(z,zz)
但是,当返回JSON列表时,有些人会收到多封电子邮件。因此,上述方法仅记录每个人的第一封电子邮件,如何将多封电子邮件保存为矢量(而不是多列)?
答案 0 :(得分:2)
更新1: 要从URL读取json,您只需使用fromJSON函数,使用您的json数据url传递字符串:
library(jsonlite)
url <- 'http://you.url.com/data.json'
# in this case we pass an URL to the fromJSON function instead of the actual content we want to parse
fromJSON(url, flatten=TRUE)$results[c("synopsis.headline", "synopsis.emailAddresses")]
// end UPDATE 1
你也可以将展平参数传递给fromJSON,然后使用'results'数据框。
fromJSON(json.data, flatten=TRUE)$results[c("synopsis.headline",
"synopsis.emailAddresses")]
synopsis.headline synopsis.emailAddresses
1 *** jane.doe@boo.com
2 *** john.doe@foo.com
这里是我如何定义json.data,请注意我故意在您的示例输入json中添加了1条记录。
json.data <- '{
"results":[
{
"personUuid":"***",
"synopsis":{
"fullName":"***",
"headline":"***",
"location":"***",
"image":"***",
"skills":[
"*",
"*",
"*",
"*.",
"*"
],
"phoneNumbers":[
"***",
"***"
],
"emailAddresses":[
"jane.doe@boo.com"
],
"networks":[
{
"name":"linkedin",
"url":"***",
"type":"canonicalUrl",
"lastAccessed":null
},
{
"name":"***",
"url":"***",
"type":"cvUrl",
"lastAccessed":"*"
},
{
"name":"*",
"url":"***",
"type":"cvUrl",
"lastAccessed":"*"
}
]
}
},
{
"personUuid":"***",
"synopsis":{
"fullName":"***",
"headline":"***",
"location":"***",
"image":"***",
"skills":[
"*",
"*",
"*",
"*.",
"*"
],
"phoneNumbers":[
"***",
"***"
],
"emailAddresses":[
"john.doe@foo.com"
],
"networks":[
{
"name":"linkedin",
"url":"***",
"type":"canonicalUrl",
"lastAccessed":null
},
{
"name":"***",
"url":"***",
"type":"cvUrl",
"lastAccessed":"*"
},
{
"name":"*",
"url":"***",
"type":"cvUrl",
"lastAccessed":"*"
}
]
}
}
]
}'
答案 1 :(得分:1)
其他测试数据可能会有所帮助。
考虑:
library(jsonlite)
library(dplyr)
json_data = "{\"results\": [\n {\n\"personUuid\": \"***\",\n\"synopsis\": {\n\"fullName\": \"***\",\n\"headline\": \"***\",\n\"location\": \"***\",\n\"image\": \"***\",\n\"skills\": [\n\"*\",\n\"*\",\n\"*\",\n\"*.\",\n\"*\"\n],\n\"phoneNumbers\": [\n\"***\",\n\"***\"\n],\n\"emailAddresses\": [\n\"***\"\n],\n\"networks\": [\n{\n \"name\": \"linkedin\",\n \"url\": \"***\",\n \"type\": \"canonicalUrl\",\n \"lastAccessed\": null\n},\n {\n \"name\": \"***\",\n \"url\": \"***\",\n \"type\": \"cvUrl\",\n \"lastAccessed\": \"*\"\n },\n {\n \"name\": \"*\",\n \"url\": \"***\",\n \"type\": \"cvUrl\",\n \"lastAccessed\": \"*\"\n }\n ]\n}\n}]}"
(df <- jsonlite::fromJSON(json_data, simplifyDataFrame = TRUE, flatten = TRUE))
#> $results
#> personUuid synopsis.fullName synopsis.headline synopsis.location
#> 1 *** *** *** ***
#> synopsis.image synopsis.skills synopsis.phoneNumbers
#> 1 *** *, *, *, *., * ***, ***
#> synopsis.emailAddresses
#> 1 ***
#> synopsis.networks
#> 1 linkedin, ***, *, ***, ***, ***, canonicalUrl, cvUrl, cvUrl, NA, *, *
df$results %>%
select(headline = synopsis.headline, emails = synopsis.emailAddresses)
#> headline emails
#> 1 *** ***