第一个数字前的gsub字符串,带有大写和小写字符

时间:2019-01-28 19:19:04

标签: r

删除第一个数字之后的所有内容。我的数据如下:

[1] NA                                   "ITEM 1. BUSINESS"                  
[3] "ITEM 1A. RISK FACTORS"              "ITEM 1B. UNRESOLVED STAFF COMMENTS"
[5] "ITEM 2. PROPERTIES"                 "ITEM 3. LEGAL PROCEEDINGS"       

我正在努力保持自己的状态

NA           ITEM1
ITEM1A      ITEM1B
ITEM2       ITEM3

(甚至在ITEM 1,ITEM 2等之间保留空格)

我尝试了以下方法,但没有任何运气。

x <- toupper(x)
x <- gsub("[^[:alnum:][:space:]]","", x)
x <- gsub(" ", "", x)
x <- substr(x, start = 1, stop = 7)
x <- gsub("\\[digits]*","", x)

也尝试过:

    y <- str_extract(x, "Item")
y <- str_extract(toupper(words$item), "ITEM")

数据:

c(NA, "ITEM 1. BUSINESS", "ITEM 1A. RISK FACTORS", "ITEM 1B. UNRESOLVED STAFF COMMENTS", 
"ITEM 2. PROPERTIES", "ITEM 3. LEGAL PROCEEDINGS", "ITEM 4. MINE SAFETY DISCLOSURES", 
"ITEM 5. MARKET FOR REGISTRANT’S COMMON EQUITY, RELATED STOCKHOLDER MATTERS AND ISSUER PURCHASES OF EQUITY SECURITIES", 
"ITEM 6. SELECTED FINANCIAL DATA ", "ITEM 7. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIAL CONDITION AND RESULTS OF OPERATIONS ", 
"ITEM 7A. QUANTITATIVE AND QUALITATIVE DISCLOSURES ABOUT MARKET RISK", 
"ITEM 8. FINANCIAL STATEMENTS AND SUPPLEMENTARY DATA", "ITEM 9. CHANGES IN AND DISAGREEMENTS WITH ACCOUNTANTS ON ACCOUNTING AND FINANCIAL DISCLOSURE", 
"ITEM 9A. CONTROLS AND PROCEDURES", "ITEM 9B.  OTHER INFORMATION", 
"ITEM 10. DIRECTORS, EXECUTIVE OFFICERS AND CORPORATE GOVERNANCE", 
"ITEM 11. EXECUTIVE COMPENSATION", "ITEM 12. SECURITY OWNERSHIP OF CERTAIN BENEFICIAL OWNERS AND MANAGEMENT AND RELATED STOCKHOLDER MATTERS", 
"ITEM 13. CERTAIN RELATIONSHIPS AND RELATED TRANSACTIONS, AND DIRECTOR INDEPENDENCE", 
"ITEM 14. PRINCIPAL ACCOUNTING FEES AND SERVICES", "ITEM 15. EXHIBITS, FINANCIAL STATEMENT SCHEDULE", 
"Item 1.    Business", "Item 1A.    Risk Factors", "Item 1B.    Unresolved Staff Comments", 
"Item 2.    Properties", "Item 3.    Legal Proceedings", "Item 4.    Mine Safety Disclosure", 
"Item 5.    Market for Registrant’s Common Equity, Related Stockholder Matters and Issuer Purchases of Equity Securities", 
"Item 6.    Selected Financial Data", "Item 7.    Management’s Discussion and Analysis of Financial Condition and Results of Operations", 
"Item 7A.    Quantitative and Qualitative Disclosures About Market Risk", 
"Item 8.    Financial Statements and Supplementary Data", "Item 9.    Changes in and Disagreements with Accountants on Accounting and Financial Disclosure", 
"Item 9A.    Controls and Procedures", "Item 9B.    Other Information", 
"Item 10.    Directors, Executive Officers and Corporate Governance", 
"Item 11.    Executive Compensation", "Item 12.    Security Ownership of Certain Beneficial Owners and Management and Related Stockholder Matters", 
"Item 13.    Certain Relationships and Related Transactions, and Director Independence", 
"Item 14.    Principal Accountant Fees and Services", "Item 15.    Exhibits and Financial Statement Schedules(a)(1) and (2).  The following documents have been included in Part II, Item 8. Report of Ernst & Young LLP, Independent Registered Public Accounting Firm, on Financial Statements Consolidated Statements of Financial Position — As of December 31, 2017 and 2016 Consolidated Statements of Income — Years Ended December 31, 2017, 2016 and 2015 Consolidated Statements of Comprehensive Income — Years Ended December 31, 2017, 2016 and 2015 Consolidated Statements of Shareholders’ Equity — Years Ended December 31, 2017, 2016 and 2015 Consolidated Statements of Cash Flows — Years Ended December 31, 2017, 2016 and 2015 Notes to Consolidated Financial Statements", 
"Item 1.  Business.", "Item 1A.  Risk Factors.", "Item 1B.  Unresolved Staff Comments.", 
"Item 2.  Properties.", "Item 3.  Legal Proceedings.", "Item 4.  Mine Safety Disclosures.", 
"Item 5.  Market for Registrant's Common Equity, Related Stockholder Matters and Issuer Purchases of Equity Securities.", 
"Item 6.  Selected Financial Data.", "Item 7.  Management's Discussion and Analysis of Financial Condition and Results of Operations. ", 
"Item 7A.  Quantitative and Qualitative Disclosures About Market Risk.", 
"Item 8.  Financial Statements and Supplementary Data.", "Item 9.  Changes in and Disagreements with Accountants on Accounting and Financial Disclosure.", 
"Item 9A.  Controls and Procedures.", "Item 9B.  Other Information.", 
"Item 10.  Directors, Executive Officers and Corporate Governance.", 
"Item 11.  Executive Compensation.", "Item 12.  Security Ownership of Certain Beneficial Owners and Management and Related Stockholder Matters.", 
"Item 13.  Certain Relationships and Related Transactions, and Director Independence.", 
"Item 14.  Principal Accounting Fees and Services.", "Item 15.  Exhibits, Financial Statement Schedules.", 
"Item 16. Form 10-K Summary.", "Item 4.    Mine Safety Disclosures", 
"Item 4A.    Executive Officers", "Item 5.    Market for the Registrant's Common Equity, Related Stockholder Matters and Issuer Purchases of Equity Securities", 
"Item 6.    Selected Financial Data", "Item 7.   Management's Discussion and Analysis of Financial Condition and Results of Operations", 
"Item 8.   Financial Statements and Supplementary Data", "Item 15.    Exhibits, Financial Statement Schedules"
)

2 个答案:

答案 0 :(得分:2)

我们可以使用PowerManager.WakeLock wakeLock = powerManager.newWakeLock(PowerManager.SCREEN_BRIGHT_WAKE_LOCK | PowerManager.FULL_WAKE_LOCK | PowerManager.ACQUIRE_CAUSES_WAKEUP, "MyTag:WatchFaceWakelockTag"); 来捕获一个或多个非数字字符,后跟数字作为一个组,在替换中,使用捕获组的后向引用(<uses-permission android:name="android.permission.WAKE_LOCK"/> )。

sub

如果要删除所有空格,请使用\\1

删除空格
x1 <- sub("^([^0-9]+[0-9]+[A-Za-z]*).*", "\\1", x)
x1
#[1] NA        "ITEM 1"  "ITEM 1A" "ITEM 1B" "ITEM 2"  "ITEM 3"  "ITEM 4"  "ITEM 5"  "ITEM 6"  "ITEM 7"  "ITEM 7A" "ITEM 8"  "ITEM 9" 
#[14] "ITEM 9A" "ITEM 9B" "ITEM 10" "ITEM 11" "ITEM 12" "ITEM 13" "ITEM 14" "ITEM 15" "Item 1"  "Item 1A" "Item 1B" "Item 2"  "Item 3" 
#[27] "Item 4"  "Item 5"  "Item 6"  "Item 7"  "Item 7A" "Item 8"  "Item 9"  "Item 9A" "Item 9B" "Item 10" "Item 11" "Item 12" "Item 13"
#[40] "Item 14" "Item 15" "Item 1"  "Item 1A" "Item 1B" "Item 2"  "Item 3"  "Item 4"  "Item 5"  "Item 6"  "Item 7"  "Item 7A" "Item 8" 
#[53] "Item 9"  "Item 9A" "Item 9B" "Item 10" "Item 11" "Item 12" "Item 13" "Item 14" "Item 15" "Item 16" "Item 4"  "Item 4A" "Item 5" 
#[66] "Item 6"  "Item 7"  "Item 8"  "Item 15"

答案 1 :(得分:2)

这是另一种方式。我们可以将\\U标志和perl = TRUE一起使用以大写:

s1 <- gsub("^(.*?)\\..*","\\U\\1", test, perl = T)
s2 <- gsub("\\s+", "", s1)

[1] NA       "ITEM1"  "ITEM1A" "ITEM1B" "ITEM2"  "ITEM3"  
 "ITEM4"  "ITEM5"  "ITEM6"  "ITEM7"  "ITEM7A"

我的第一个表达式根据句点所在的位置断开“项目”。