我正在尝试通过从列值创建新行来格式化R中的凌乱数据框。数据片段如下所示。
id producer pcountry collaborator ccountry val
1 J&J USA Pfizer USA 25
2 Biodiem AUS PhaseBio USA 65
GeneScience China
3 Shire Ireland N/A N/A 54
4 Sanofi France N/A N/A 64
基本上,我想使用最后两列中的值在数据框中创建新行。到目前为止,我已经使用splitstackshape
包获得了这段代码。
df2 <- cSplit(df, 4, "\r", "long")
这将对协作者列(例如上面的第2行)中具有多个值的条目执行此工作。使用我的代码可以给我:`
id producer pcountry collaborator ccountry val
1 J&J USA Pfizer USA 25
2 Biodiem AUS PhaseBio USA 65
China
3 Biodiem AUS Genescience USA 65
China
4 Shire Ireland N/A N/A 54
5 Sanofi France N/A N/A 64
但是,我要处理的数据还有很多事情要做。我希望协作者列的值与ccountry列的值匹配,因此此处的第3行在China
列中的值为ccountry
,而第2行则为USA
。我尝试将两列都添加到代码中,就像df2 <- cSplit(df, c(4,5), "\r", "long")
一样,但这只会造成很大的麻烦。
最后,由于代码仅使用新行分隔符创建新条目,因此它会忽略只有1值的行(如第1行),因为它们没有新行。我希望这些也包括在内。
是否有任何方法可以更改此代码以执行这两个附加步骤,还是需要为此编写一个函数?
编辑:这是数据段
id producer pcountry collaborator ccountry val
<dbl> <chr> <chr> <chr> <chr> <dbl>
1 1 J&J USA Pfizer USA 25
2 2 Biodiem AUS "PhaseBio\r\nGenescience" "USA\r\nChina" 65
3 3 Shire Ireland NA NA 54
4 4 Sanofi France NA NA 64
structure(list(id = c(1, 2, 3, 4), producer = c("J&J", "Biodiem",
"Shire", "Sanofi"), pcountry = c("USA", "AUS", "Ireland", "France"
), collaborator = c("Pfizer", "PhaseBio\r\nGenescience", NA,
NA), ccountry = c("USA", "USA\r\nChina", NA, NA), val = c(25,
65, 54, 64)), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame"))
这是预期的结果
id producer pcountry collaborator ccountry val
<dbl> <chr> <chr> <chr> <chr> <dbl>
1 1 J&J USA NA NA 25
2 2 J&J USA Pfizer USA 25
3 3 Biodiem AUS NA NA 65
4 4 Biodiem AUS PhaseBio USA 65
5 5 Biodiem AUS Genescience China 65
6 6 Shire Ireland NA NA 54
7 7 Sanofi France NA NA 64
structure(list(id = c(1, 2, 3, 4, 5, 6), producer = c("J&J",
"J&J", "Biodiem", "Biodiem", "Biodiem", "Shire"), pcountry = c("USA",
"USA", "AUS", "AUS", "AUS", "Ireland"), collaborator = c(NA,
"Pfizer", NA, "PhaseBio", "Genescience", NA), ccountry = c(NA,
"USA", NA, "USA", "China", NA), val = c(25, 25, 65, 65, 65, 54
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
答案 0 :(得分:1)
使用tidyr
超级简单:
require(tidyr)
separate_rows(df, collaborator,ccountry, sep="\r\n")
# A tibble: 5 x 6
id producer pcountry collaborator ccountry val
<dbl> <chr> <chr> <chr> <chr> <dbl>
1 1 J&J USA Pfizer USA 25
2 2 Biodiem AUS PhaseBio USA 65
3 2 Biodiem AUS Genescience China 65
4 3 Shire Ireland NA NA 54
5 4 Sanofi France NA NA 64
如果您希望所有这些带有NA的额外行供协作者和国家使用,您可以执行以下操作:
require(tidyr)
require(dplyr)
df %>% mutate(collaborator=ifelse(is.na(collaborator), NA, paste0("\r\n",collaborator)),
ccountry=ifelse(is.na(ccountry), NA, paste0("\r\n",ccountry))) %>% # Create extra rows before non NA rows
separate_rows(collaborator,ccountry, sep="\r\n") %>%
mutate(collaborator=ifelse(collaborator=="",NA,collaborator),
ccountry=ifelse(ccountry=="", NA, ccountry)) # change empty strings to NAs
# A tibble: 7 x 6
id producer pcountry collaborator ccountry val
<dbl> <chr> <chr> <chr> <chr> <dbl>
1 1 J&J USA NA NA 25
2 1 J&J USA Pfizer USA 25
3 2 Biodiem AUS NA NA 65
4 2 Biodiem AUS PhaseBio USA 65
5 2 Biodiem AUS Genescience China 65
6 3 Shire Ireland NA NA 54
7 4 Sanofi France NA NA 64
答案 1 :(得分:1)
考虑在import firebase from './firebase'
export async function loginWithFacebook() {
const { type, token } = await Expo.Facebook.logInWithReadPermissionsAsync('2197841940631405', { permissions: ['public_profile', 'email']});
console.log(type);
if (type == 'success') {
const credential = firebase.auth.FacebookAuthProvider.credential(token);
firebase.auth().signInWithCredential(credential).catch(error => {
console.log(error)
})
}
}
分组过程中使用strsplit
的基本R方法:
by