我想从wikidata检索信息并将其存储在数据框中。为了简单起见,我假设我想获得以下电影的类型,然后过滤属于科幻小说的那些:
movies = c("Star Wars Episode IV: A New Hope", "Interstellar",
"Happythankyoumoreplease")
我知道有一个名为WikidataR
的包。如果我没有错,根据its vignettes,有两个可能有用的命令:find_item
和find_property
允许您检索一组维基数据项或属性,其中包含别名或描述匹配特定的搜索词。显然他们对我很好,所以我想做一些像
for (i in movies) {
info = find_item(i)
}
这是我从每个项目得到的:
> find_item("Interstellar")
Wikidata item search
Number of results: 10
Results:
1 Interstellar (Q13417189) - 2014 US science fiction film
2 Interstellar (Q6057099)
3 interstellar medium (Q41872) - matter and fields (radiation) that exist in the space between the star systems in a galaxy;includes gas in ionic, atomic or molecular form, dust and cosmic rays. It fills interstellar space and blends smoothly into the surrounding intergalactic space
4 space colonization (Q686876) - concept of permanent human habitation outside of Earth
5 rogue planet (Q167910) - planetary-mass object that orbits the galaxy directly
6 interstellar cloud (Q1054444) - accumulation of gas, plasma and dust in a galaxy
7 interstellar travel (Q834826) - term used for hypothetical manned or unmanned travel between stars
8 Interstellar Boundary Explorer (Q835898)
9 starship (Q2003852) - spacecraft designed for interstellar travel
10 interstellar object (Q2441216) - astronomical object in interstellar space, such as a comet
>
不幸的是,我从find_item获得的信息(见下文)有两个问题:
同样,find_property
提供某个属性的元数据。 find_property("genre")
检索以下信息:
> find_property("genre")
Wikidata property search
Number of results: 4
Results:
1 genre (P136) - a creative work's genre or an artist's field of work (P101). Use main subject (P921) to relate creative works to their topic
2 radio format (P415) - describes the overall content broadcast on a radio station
3 sex or gender (P21) - sexual identity of subject: male (Q6581097), female (Q6581072), intersex (Q1097630), transgender female (Q1052281), transgender male (Q2449503). Animals: male animal (Q44148), female animal (Q43445). Groups of same gender use "subclass of" (P279)
4 gender of a scientific name of a genus (P2433) - determines the correct form of some names of species and subdivisions of species, also subdivisions of a genus
这有类似的问题:
movies
向量中的每个对象相关联。有没有办法最终得到一个包含这些电影类型的数据框? (或者包含所有wikidata信息的数据框,我必须操作以过滤或选择我想要的数据?)
答案 0 :(得分:1)
这些只是$collection = new \Illuminate\Support\Collection($array);
$group = $collection->groupBy('product_id');
$resultArray = $group->map(function($item, $key) {
return [
'total_size' => $item->sum('comp_size'),
'product_id' => $key,
];
});
。例如,您可以使用lists
获取图片。
然后,您可以浏览列表中的每个元素并选择所需的项目。例如。获得标题和标签
str(find_item("Interstellar"))
如果缺少某些元素,这对于常规数据很容易,那么你将不得不处理它,例如某些项目没有描述。所以你可以解决以下问题。
a <- find_item("Interstellar")
b <- Reduce(rbind,lapply(a, function(x) cbind(x$title,x$label)))
data.frame(b)
## X1 X2
## 1 Q13417189 Interstellar
## 2 Q6057099 Interstellar
## 3 Q41872 interstellar medium
## 4 Q686876 space colonization
## 5 Q167910 rogue planet
## 6 Q1054444 interstellar cloud
## 7 Q834826 interstellar travel
## 8 Q835898 Interstellar Boundary Explorer
## 9 Q2003852 starship
## 10 Q2441216 interstellar object