我正在尝试匹配文本文件中的模式。只要图案位于一行内,它就可以很好地工作。但是,在某些情况下,模式可能会跨越两行。 我有以下代码:
#indicate the Name pattern to R
name_pattern = '<nameOfIssuer>([^<]*)</nameOfIssuer>'
#Collect information that match the pattern that we are looking #
datalines = grep(name_pattern, thepage[1:length(thepage)], value = TRUE)
#We will use gregexpr and gsub to extract the information without the html tags
#create a function first
getexpr = function(s,g)substring(s,g,g+attr(g,'match.length')-1)
gg = gregexpr(name_pattern, datalines)
matches = mapply(getexpr, datalines, gg)
result = gsub(name_pattern, '\\1', matches)
result <- gsub("&", "&", result)
names(result) = NULL
当文本为:
时效果很好<nameOfIssuer>Posco ADR</nameOfIssuer>
如果文本如下所示,则不能播放:
<nameOfIssuer>Bank of
America Corp</nameOfIssuer>
有人知道如何动态处理这两种情况吗?
全文如下:
<SEC-DOCUMENT>0001437749-18-018038.txt : 20181009
<SEC-HEADER>0001437749-18-018038.hdr.sgml : 20181009
<ACCEPTANCE-DATETIME>20181005183736
ACCESSION NUMBER: 0001437749-18-018038
CONFORMED SUBMISSION TYPE: 13F-HR
PUBLIC DOCUMENT COUNT: 2
CONFORMED PERIOD OF REPORT: 20180930
FILED AS OF DATE: 20181009
DATE AS OF CHANGE: 20181005
EFFECTIVENESS DATE: 20181009
FILER:
COMPANY DATA:
COMPANY CONFORMED NAME: DAILY JOURNAL CORP
CENTRAL INDEX KEY: 0000783412
STANDARD INDUSTRIAL CLASSIFICATION: NEWSPAPERS: PUBLISHING OR PUBLISHING & PRINTING [2711]
IRS NUMBER: 954133299
STATE OF INCORPORATION: SC
FISCAL YEAR END: 0930
FILING VALUES:
FORM TYPE: 13F-HR
SEC ACT: 1934 Act
SEC FILE NUMBER: 028-15782
FILM NUMBER: 181111587
BUSINESS ADDRESS:
STREET 1: 915 EAST FIRST STREET
CITY: LOS ANGELES
STATE: CA
ZIP: 90012
BUSINESS PHONE: 2132295300
MAIL ADDRESS:
STREET 1: 915 EAST FIRST STREET
CITY: LOS ANGELES
STATE: CA
ZIP: 90012
FORMER COMPANY:
FORMER CONFORMED NAME: DAILY JOURNAL CO
DATE OF NAME CHANGE: 19870427
</SEC-HEADER>
<DOCUMENT>
<TYPE>13F-HR
<SEQUENCE>1
<FILENAME>primary_doc.xml
<TEXT>
<XML>
<?xml version="1.0" encoding="UTF-8"?>
<edgarSubmission xmlns="http://www.sec.gov/edgar/thirteenffiler" xmlns:com="http://www.sec.gov/edgar/common">
<headerData>
<submissionType>13F-HR</submissionType>
<filerInfo>
<liveTestFlag>LIVE</liveTestFlag>
<flags>
<confirmingCopyFlag>false</confirmingCopyFlag>
<returnCopyFlag>true</returnCopyFlag>
<overrideInternetFlag>false</overrideInternetFlag>
</flags>
<filer>
<credentials>
<cik>0000783412</cik>
<ccc>XXXXXXXX</ccc>
</credentials>
</filer>
<periodOfReport>09-30-2018</periodOfReport>
</filerInfo>
</headerData>
<formData>
<coverPage>
<reportCalendarOrQuarter>09-30-2018</reportCalendarOrQuarter>
<filingManager>
<name>DAILY JOURNAL CORP</name>
<address>
<com:street1>915 EAST FIRST STREET</com:street1>
<com:city>LOS ANGELES</com:city>
<com:stateOrCountry>CA</com:stateOrCountry>
<com:zipCode>90012</com:zipCode>
</address>
</filingManager>
<reportType>13F HOLDINGS REPORT</reportType>
<form13FFileNumber>028-15782</form13FFileNumber>
<provideInfoForInstruction5>N</provideInfoForInstruction5>
</coverPage>
<signatureBlock>
<name>Gerald L. Salzman</name>
<title>Chief Executive Officer, President, CFO, Treasurer</title>
<phone>213-229-5300</phone>
<signature>/s/ Gerald L. Salzman</signature>
<city>Los Angeles</city>
<stateOrCountry>CA</stateOrCountry>
<signatureDate>10-05-2018</signatureDate>
</signatureBlock>
<summaryPage>
<otherIncludedManagersCount>0</otherIncludedManagersCount>
<tableEntryTotal>4</tableEntryTotal>
<tableValueTotal>159459</tableValueTotal>
<isConfidentialOmitted>false</isConfidentialOmitted>
</summaryPage>
</formData>
</edgarSubmission>
</XML>
</TEXT>
</DOCUMENT>
<DOCUMENT>
<TYPE>INFORMATION TABLE
<SEQUENCE>2
<FILENAME>rdgit100518.xml
<TEXT>
<XML>
<?xml version="1.0" encoding="us-ascii"?>
<informationTable xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sec.gov/edgar/document/thirteenf/informationtable">
<infoTable>
<nameOfIssuer>Bank of
America Corp</nameOfIssuer>
<titleOfClass>Common Stock</titleOfClass>
<cusip>060505104</cusip>
<value>67758</value>
<shrsOrPrnAmt>
<sshPrnamt>2300000</sshPrnamt>
<sshPrnamtType>SH</sshPrnamtType>
</shrsOrPrnAmt>
<investmentDiscretion>SOLE</investmentDiscretion>
<votingAuthority>
<Sole>2300000</Sole>
<Shared>0</Shared>
<None>0</None>
</votingAuthority>
</infoTable>
<infoTable>
<nameOfIssuer>Posco ADR</nameOfIssuer>
<titleOfClass>Sponsored ADR</titleOfClass>
<cusip>693483109</cusip>
<value>643</value>
<shrsOrPrnAmt>
<sshPrnamt>9745</sshPrnamt>
<sshPrnamtType>SH</sshPrnamtType>
</shrsOrPrnAmt>
<investmentDiscretion>SOLE</investmentDiscretion>
<votingAuthority>
<Sole>9745</Sole>
<Shared>0</Shared>
<None>0</None>
</votingAuthority>
</infoTable>
<infoTable>
<nameOfIssuer>US Bancorp</nameOfIssuer>
<titleOfClass>Common Stock</titleOfClass>
<cusip>902973304</cusip>
<value>7393</value>
<shrsOrPrnAmt>
<sshPrnamt>140000</sshPrnamt>
<sshPrnamtType>SH</sshPrnamtType>
</shrsOrPrnAmt>
<investmentDiscretion>SOLE</investmentDiscretion>
<votingAuthority>
<Sole>140000</Sole>
<Shared>0</Shared>
<None>0</None>
</votingAuthority>
</infoTable>
<infoTable>
<nameOfIssuer>Wells Fargo &amp; Co</nameOfIssuer>
<titleOfClass>Common Stock</titleOfClass>
<cusip>949746101</cusip>
<value>83665</value>
<shrsOrPrnAmt>
<sshPrnamt>1591800</sshPrnamt>
<sshPrnamtType>SH</sshPrnamtType>
</shrsOrPrnAmt>
<investmentDiscretion>SOLE</investmentDiscretion>
<votingAuthority>
<Sole>1591800</Sole>
<Shared>0</Shared>
<None>0</None>
</votingAuthority>
</infoTable>
</informationTable>
</XML>
</TEXT>
</DOCUMENT>
</SEC-DOCUMENT>
答案 0 :(得分:2)
假设您的文档中可能有多个个匹配<nameOfIssuer>
的标签,并且您想匹配所有标签,那么我们可以尝试将grepexpr
与{{1} }:
regmatches
答案 1 :(得分:0)
使用Tim的解决方案以及粘贴粘贴的折叠选项,程序可以正常工作。代码如下:
Access-Control-Allow-Origin: *