我逐行读取文件,有些行的多行值如下所示,因为我的循环中断并返回意外结果。
string[] articles = File.ReadAllLines(myFile);
char[] separators = { '\t' };
for (int i = 0; i < articles.Length; i++)
{
string[] article = articles[i].Split(separators);
string code = article[0];
string name = article[1];
string val1 = article[2];
string val2 = article[3];
//do something with these values
}
此处TSNK /元数据/ tk_ISIN&amp; TSNK / Metadata / tk_oneTISNumber_TEXT具有多行值。从文件中逐行读取时如何将这些字段作为单行读取?
我尝试过以下逻辑,但没有产生预期的结果:
尝试{
class FooBarFactory(factory.django.DjangoModelFactory):
class Meta:
model = 'foobar' # <-- must be 'myapp.foobar'
答案 0 :(得分:0)
解决方案涉及Scanner
和多行正则表达式。
这里假设您的所有行都以TSNK/Metadata/
Scanner scanner = new Scanner(new File("file.txt"));
scanner.useDelimiter("TSNK/Metadata/");
Pattern p = Pattern.compile("(.*)=(.*)", Pattern.DOTALL | Pattern.MULTILINE);
String s = null;
do {
if (scanner.hasNext()) {
s = scanner.next();
Matcher matcher = p.matcher(s);
if (matcher.find()) {
System.out.println("key = '" + matcher.group(1) + "'");
String[] values = matcher.group(2).split("[,\n]");
int i = 1;
for (String value : values) {
System.out.println(String.format(" val(%d)='%s',", (i++), value ));
}
}
}
} while (s != null);
以上产生输出
key = 'tk.filename'
val(0)='PZSIIF-anefnsadual-rasdfepdasdort.pdf',
key = 'tk_ISIN'
val(0)='LU0291600822',
val(1)='LU0871812862',
val(2)='LU0327774492',
val(3)='LU0291601986',
val(4)='LU0291605201',
val(5)='',
val(6)='LU0291595725',
val(7)='LU0291599800',
val(8)='LU0726995649',
val(9)='LU0726996290',
val(10)='LU0726995995',
val(11)='LU0726995136',
val(12)='LU0726995482',
val(13)='LU0726995219',
val(14)='LU0855227368',
key = 'tk_GroupCode'
val(0)='PZSIIF',
key = 'tk_GroupCode/PZSIIF'
val(0)='y',
key = 'tk_oneTISNumber'
val(0)='16244',
val(1)='17007',
val(2)='16243',
val(3)='11520',
val(4)='19298',
val(5)='18247',
val(6)='20755',
key = 'tk_oneTISNumber_TEXT'
val(0)='Neo Emerging Market Corporate Debt ',
val(1)='Neo Emerging Market Debt Opportunities II ',
val(2)='Neo Emerging Market Investment Grade Debt ',
val(3)='Neo Floating Rate II ',
val(4)='Neo Upper Tier Floating Rate ',
val(5)='Global Balanced Regulation 28 ',
val(6)='Neo Multi-Sector Credit Income',
请注意空条目(密钥val(5)
的{{1}}),因为该条目中有新行后跟逗号。它可以很容易地通过拒绝空字符串或通过调整分裂模式进行排序。
希望这有帮助!