我的数据是与文章作者有联系的学术机构的列表,而我正在处理的文章看起来像这样:
1 MIT
2 NBER; NBER
3 U MI; Cornell U; U VA
4 Harvard U; U Chicago
5 U OR; U CA, Davis; U British Columbia
6 World Bank; Dartmouth College; EDHEC Business School; Harvard U
7 Columbia U and IZA; Columbia U and IZA
8 World Bank; Yale U and Abdul Latif Jameel Poverty Action Lab; Dartmouth College
9 Carnegie Mellon U; Carnegie Mellon U; Carnegie Mellon U
10 Columbia U; U CA, San Diego
11 U CA, Berkeley; McMaster U; McMaster U
12 ETH Zurich and CESifo; U Copenhagen and CESifo
我想在分号(最好是在“和”处)分隔行,以便我可以找出哪些学术机构是唯一的。
我尝试通过使用tidyr软件包中的split_rows-function来做到这一点:
Affiliation<-separate_rows(Affiliation, sep=";")
或者:
Affiliation<-separate_rows(Affiliation, sep="; | and")
这些方法都不起作用,我的数据看起来完全一样。我究竟做错了什么?
在下面附加dput输出:
structure(list(AF = c("MIT", "NBER; NBER", "U MI; Cornell U; U VA",
"Harvard U; U Chicago", "U OR; U CA, Davis; U British Columbia",
"World Bank; Dartmouth College; EDHEC Business School; Harvard U",
"Columbia U and IZA; Columbia U and IZA", "World Bank; Yale U and Abdul Latif Jameel Poverty Action Lab; Dartmouth College",
"Carnegie Mellon U; Carnegie Mellon U; Carnegie Mellon U", "Columbia U; U CA, San Diego",
"U CA, Berkeley; McMaster U; McMaster U", "ETH Zurich and CESifo; U Copenhagen and CESifo",
"U MN, St Paul; Compass Lexecon, Washington, DC; Harvard U",
"U WI", "U Chicago and IZA; Harvard U; Harvard U")), row.names = c(NA,
15L), class = "data.frame")