我想提取包含数字的所有文字,例如" US6184521-B1"和" US3967255-A",在以下字符串中:
US6184521-B1 -- US3967255-A DELPHIAN FOUNDATION (DELP-Non-standard); Q2 CORP (QTWO-Non-standard) OLIVER S M, PROUD R A, PARSONS S J; US3973118-A LAMONTAGNE J A (LAMO-Individual) LAMONTAGNE J A; US4303855-A IBM CORP (IBMC) BAPST U H, GFELLER F, VETTIGER P; US4394572-A BIOX TECH INC (BIOX-Non-standard) WILBER S; US4407290-A BIOX TECH INC (BIOX-Non-standard); BOC GROUP PLC (BRTO) WILBER S A; US4633087-A TREBOR INDS INC (TREB-Non-standard) ROSENTHAL G K, STEPHENS J D, ROSENTHAL R D; US4678921-A NIPPONDENSO CO LTD (NPDE) NAKAMURA T, SATO S, HATTORI T, NABETA T, KATO M; US4864126-A HEWLETT-PACKARD CO (HEWP) WALTERS M D, PERYESZI J, PETRILLA J F, PERNYESZI J; US4865038-A NOVAMETRIX MED SYST INC (NOVA-Non-standard) RICH D, THOMAS S; US4907594-A NICOLAY GMBH (NICO-Non-standard) MUZ E; US4939375-A HEWLETT-PACKARD CO (HEWP) WALTERS M D, PERNYESZI J, PETRILLA J F; US5036437-A LECTRON PRODUCTS IN (LECT-Non-standard) MACKS H R; US5209230-A NELLCOR INC (NELL-Non-standard) SWEDLOW D B, WARING J, DELONZO R; US5237994-A SQUARE ONE TECHNOLOGY (SQUA-Non-standard) GOLDBERGER D S; US5239169-A MICROSCAN SYSTEMS INC (MICR-Non-standard) THOMAS J E; US5325192-A TEKTRONIX INC (TEKT) ALLEN D W; US5373102-A US SEC OF ARMY (USSA) DAVENPORT W E, EHRLICH J J, TAYLOR T S; US5561295-A LITTON SYSTEMS INC (LITO) PREIS M K, JACKSEN N F; US5629517-A XEROX CORP (XERO) JACKSON W B, BIEGELSEN D K, STREET R A, WEISFIELD R L; US5752914-A NELLCOR PURITAN BENNETT INC (MLCW) DELONZOR R, NAMY A; US5786592-A HOEK INSTR AB (HOEK-Non-standard) HOEK B
这应该与显示here的内容类似,但我想提取数字和字母。我怎样才能在R中实现这一目标?
答案 0 :(得分:2)
试试这个:
test<-c("aa1","aaa")
test[grepl("[1-9]", test)]
[1] "aa1"
使用您的数据:
input<-"US6184521-B1 -- US3967255-A DELPHIAN FOUNDATION (DELP-Non-standard); Q2 CORP (QTWO-Non-standard) OLIVER S M, PROUD R A, PARSONS S J; US3973118-A LAMONTAGNE J A (LAMO-Individual) LAMONTAGNE J A; US4303855-A IBM CORP (IBMC) BAPST U H, GFELLER F, VETTIGER P; US4394572-A BIOX TECH INC (BIOX-Non-standard) WILBER S; US4407290-A BIOX TECH INC (BIOX-Non-standard); BOC GROUP PLC (BRTO) WILBER S A; US4633087-A TREBOR INDS INC (TREB-Non-standard) ROSENTHAL G K, STEPHENS J D, ROSENTHAL R D; US4678921-A NIPPONDENSO CO LTD (NPDE) NAKAMURA T, SATO S, HATTORI T, NABETA T, KATO M; US4864126-A HEWLETT-PACKARD CO (HEWP) WALTERS M D, PERYESZI J, PETRILLA J F, PERNYESZI J; US4865038-A NOVAMETRIX MED SYST INC (NOVA-Non-standard) RICH D, THOMAS S; US4907594-A NICOLAY GMBH (NICO-Non-standard) MUZ E; US4939375-A HEWLETT-PACKARD CO (HEWP) WALTERS M D, PERNYESZI J, PETRILLA J F; US5036437-A LECTRON PRODUCTS IN (LECT-Non-standard) MACKS H R; US5209230-A NELLCOR INC (NELL-Non-standard) SWEDLOW D B, WARING J, DELONZO R; US5237994-A SQUARE ONE TECHNOLOGY (SQUA-Non-standard) GOLDBERGER D S; US5239169-A MICROSCAN SYSTEMS INC (MICR-Non-standard) THOMAS J E; US5325192-A TEKTRONIX INC (TEKT) ALLEN D W; US5373102-A US SEC OF ARMY (USSA) DAVENPORT W E, EHRLICH J J, TAYLOR T S; US5561295-A LITTON SYSTEMS INC (LITO) PREIS M K, JACKSEN N F; US5629517-A XEROX CORP (XERO) JACKSON W B, BIEGELSEN D K, STREET R A, WEISFIELD R L; US5752914-A NELLCOR PURITAN BENNETT INC (MLCW) DELONZOR R, NAMY A; US5786592-A HOEK INSTR AB (HOEK-Non-standard) HOEK B"
input<-unlist(strsplit(input,split=" "))
你的输出:
input[grepl("[1-9]", input)]
[1] "US6184521-B1" "US3967255-A" "Q2" "US3973118-A" "US4303855-A" "US4394572-A" "US4407290-A"
[8] "US4633087-A" "US4678921-A" "US4864126-A" "US4865038-A" "US4907594-A" "US4939375-A" "US5036437-A"
[15] "US5209230-A" "US5237994-A" "US5239169-A" "US5325192-A" "US5373102-A" "US5561295-A" "US5629517-A"
[22] "US5752914-A" "US5786592-A"
答案 1 :(得分:1)
一个简单的grep
会做到这一点。请注意,参数value
设置为TRUE
,默认值为FALSE
。
grep("[[:digit:]]", s, value = TRUE)
# [1] "US6184521-B1" "US3967255-A" "Q2" "US3973118-A" "US4303855-A"
# [6] "US4394572-A" "US4407290-A" "US4633087-A" "US4678921-A" "US4864126-A"
#[11] "US4865038-A" "US4907594-A" "US4939375-A" "US5036437-A" "US5209230-A"
#[16] "US5237994-A" "US5239169-A" "US5325192-A" "US5373102-A" "US5561295-A"
#[21] "US5629517-A" "US5752914-A" "US5786592-A"
数据。强>
以下内容使用scan
读取您提供的数据。它用空格分隔字符串,因此你的字符串可能不同。但这只是为了测试上面的代码。
s <-
scan(what = character(),
text = "US6184521-B1 -- US3967255-A DELPHIAN FOUNDATION (DELP-Non-standard);
Q2 CORP (QTWO-Non-standard) OLIVER S M, PROUD R A, PARSONS S J;
US3973118-A LAMONTAGNE J A (LAMO-Individual) LAMONTAGNE J A; US4303855-A
IBM CORP (IBMC) BAPST U H, GFELLER F, VETTIGER P; US4394572-A BIOX TECH INC
(BIOX-Non-standard) WILBER S; US4407290-A BIOX TECH INC (BIOX-Non-standard);
BOC GROUP PLC (BRTO) WILBER S A; US4633087-A TREBOR INDS INC (TREB-Non-standard)
ROSENTHAL G K, STEPHENS J D, ROSENTHAL R D; US4678921-A NIPPONDENSO CO LTD
(NPDE) NAKAMURA T, SATO S, HATTORI T, NABETA T, KATO M; US4864126-A
HEWLETT-PACKARD CO (HEWP) WALTERS M D, PERYESZI J, PETRILLA J F, PERNYESZI
J; US4865038-A NOVAMETRIX MED SYST INC (NOVA-Non-standard) RICH D,
THOMAS S; US4907594-A NICOLAY GMBH (NICO-Non-standard) MUZ E;
US4939375-A HEWLETT-PACKARD CO (HEWP) WALTERS M D, PERNYESZI J,
PETRILLA J F; US5036437-A LECTRON PRODUCTS IN (LECT-Non-standard)
MACKS H R; US5209230-A NELLCOR INC (NELL-Non-standard) SWEDLOW D B,
WARING J, DELONZO R; US5237994-A SQUARE ONE TECHNOLOGY (SQUA-Non-standard)
GOLDBERGER D S; US5239169-A MICROSCAN SYSTEMS INC (MICR-Non-standard)
THOMAS J E; US5325192-A TEKTRONIX INC (TEKT) ALLEN D W; US5373102-A
US SEC OF ARMY (USSA) DAVENPORT W E, EHRLICH J J, TAYLOR T S;
US5561295-A LITTON SYSTEMS INC (LITO) PREIS M K, JACKSEN N F;
US5629517-A XEROX CORP (XERO) JACKSON W B, BIEGELSEN D K, STREET R A,
WEISFIELD R L; US5752914-A NELLCOR PURITAN BENNETT INC (MLCW)
DELONZOR R, NAMY A; US5786592-A HOEK INSTR AB (HOEK-Non-standard)
HOEK B")