Question

我正在处理一个以制表符分隔的字符串。我正在使用split函数完成此操作，并且它适用于大多数情况。当字段丢失时会出现问题，因此我不会在该字段中获取null，而是获取下一个值。我将解析后的值存储在字符串数组中。

String[] columnDetail = new String[11];
columnDetail = column.split("\t");

任何帮助将不胜感激。如果可能的话，我想将解析后的字符串存储到字符串数组中，以便我可以轻松访问解析后的数据。

Answer 1

String.split使用Regular Expressions，您也不需要为拆分分配额外的数组。

分割方法会给你一个列表。，问题是你试图预先定义一个标签的出现次数，但你怎么会真的知道呢？尝试使用Scanner或StringTokenizer，只需了解分割字符串的工作原理。

让我解释一下为什么不起作用以及为什么需要\\\\来逃避\\。

好的，所以当你使用Split时，它实际上需要一个正则表达式（正则表达式），而在正则表达式中你想要定义要分割的字符，如果你写\ t实际上并不意味着{{1}你希望分割的是\t，对吧？所以，只需写一下\t，你就告诉你的正则表达式处理器“嘿嘿被转义的角色分开”不“嘿被所有看起来像\t的字符分开” 。请注意区别？使用\意味着逃避某些事情。正则表达式中的\t意味着与您的想法完全不同的东西。

因此，您需要使用解决方案：

告诉正则表达式处理器寻找\ t。好的，那你为什么需要两个em？好吧，第一个\逃脱第二个，这意味着它将如下所示：\ t当您处理文本时！

现在让我们假设您要分割\

那么你会留下\\但是看，那不行！因为\会试图逃避以前的char！这就是你希望输出为\\的原因，因此你需要有\\\\。

我真的希望上面的例子可以帮助您理解为什么您的解决方案不起作用以及如何征服其他解决方案！

现在，我之前已经给了你answer，也许你现在应该开始看它们。

其他方法

<强>的StringTokenizer

您应该查看StringTokenizer，这是此类工作的一个非常方便的工具。

示例

\\t

这将输出

StringTokenizer st = new StringTokenizer("this is a test"); while (st.hasMoreTokens()) { System.out.println(st.nextToken()); }

使用StringTokenizer的Second Constructor设置分隔符：

this is a test

<强>扫描仪

您还可以使用Scanner作为评论员之一说这看起来有点像这样

示例

StringTokenizer(String str, String delim)

输出为

String input = "1 fish 2 fish red fish blue fish"; Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*"); System.out.println(s.nextInt()); System.out.println(s.nextInt()); System.out.println(s.next()); System.out.println(s.next()); s.close();

意思是它会删除“鱼”这个词，并以“鱼”作为分隔符给你剩下的部分。

examples taken from the Java API

Answer 2

试试这个：

String[] columnDetail = column.split("\t", -1);

阅读String.split(java.lang.String, int)上的Javadoc，了解有关分割函数限制参数的说明：

split

public String[] split(String regex, int limit)
Splits this string around matches of the given regular expression.
The array returned by this method contains each substring of this string that is terminated by another substring that matches the given expression or is terminated by the end of the string. The substrings in the array are in the order in which they occur in this string. If the expression does not match any part of the input then the resulting array has just one element, namely this string.

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

The string "boo:and:foo", for example, yields the following results with these parameters:

Regex   Limit   Result
:   2   { "boo", "and:foo" }
:   5   { "boo", "and", "foo" }
:   -2  { "boo", "and", "foo" }
o   5   { "b", "", ":and:f", "", "" }
o   -2  { "b", "", ":and:f", "", "" }
o   0   { "b", "", ":and:f" }

当缺少最后几个字段（我的客人那是您的情况）时，您将获得如下列：

field1\tfield2\tfield3\t\t

如果没有为split（）设置限制，则限制为0，这将导致“尾随空字符串将被丢弃”。所以你只能得到3个字段，{“field1”，“field2”，“field3”}。

当limit设置为-1时，非正值，不会丢弃尾随空字符串。所以你可以获得5个字段，其中最后两个是空字符串，{“field1”，“field2”，“field3”，“”，“”}。

Answer 3

没有人回答 - 这部分是问题的错误：输入字符串包含11个字段（可以推断出这么多）但有多少个标签？大多数可能完全 10.然后答案是

String s = "\t2\t\t4\t5\t6\t\t8\t\t10\t";
String[] fields = s.split("\t", -1);  // in your case s.split("\t", 11) might also do
for (int i = 0; i < fields.length; ++i) {
    if ("".equals(fields[i])) fields[i] = null;
}
System.out.println(Arrays.asList(fields));
// [null, 2, null, 4, 5, 6, null, 8, null, 10, null]
// with s.split("\t") : [null, 2, null, 4, 5, 6, null, 8, null, 10]

如果字段碰巧包含标签，当然这不会按预期工作 -1表示：根据需要多次应用模式 - 因此尾随字段（第11个）将被保留（如果不存在则为空字符串（""），需要将其转换为{{1显式）。

另一方面，如果没有缺少字段的标签 - 那么null是一个只包含字段5,6的有效输入字符串 - 无法通过分割获得"5\t6"

Answer 4

如果制表符分隔字段中的数据本身包含换行符，制表符和可能的“字符”，那么

String.split实现将有严重的限制。

TAB划分的格式已经出现在驴子的年代，但格式不是标准化的，而且各不相同。许多实现不会转义字段中出现的字符（换行符和制表符）。相反，它们遵循CSV约定并将所有非平凡字段包装在“双引号”中。然后他们只逃脱双引号。因此，“线”可以延伸到多行。

读到我听说“只是重用apache工具”，这听起来不错。

最后我亲自选择opencsv。我发现它很轻，因为它提供了转义和引用字符的选项，它应该涵盖大多数流行的逗号和制表符分隔的数据格式。

示例：

CSVReader tabFormatReader = new CSVReader(new FileReader("yourfile.tsv"), '\t');

Answer 5

我刚才有同样的问题，并在某种教程中注意到答案。通常，您需要使用

来使用split方法的第二种形式

split(regex, limit)

以下是完整教程http://www.rgagnon.com/javadetails/java-0438.html

如果为limit参数设置了一些负数，则会在数组中获得缺少实际值的空字符串。要使用它，你的初始字符串应该有两个分隔符副本，即你应该有\ t \ t \ t，其中缺少值。

希望这会有所帮助：）

Answer 6

您可以使用yourstring.split（“ \ x09”）; 我对其进行了测试，并且可以正常工作。

Answer 7

String[] columnDetail = new String[11];
columnDetail = column.split("\t", -1); // unlimited
OR
columnDetail = column.split("\t", 11); // if you are sure about limit.

 * The {@code limit} parameter controls the number of times the
 * pattern is applied and therefore affects the length of the resulting
 * array.  If the limit <i>n</i> is greater than zero then the pattern
 * will be applied at most <i>n</i>&nbsp;-&nbsp;1 times, the array's
 * length will be no greater than <i>n</i>, and the array's last entry
 * will contain all input beyond the last matched delimiter.  If <i>n</i>
 * is non-positive then the pattern will be applied as many times as
 * possible and the array can have any length.  If <i>n</i> is zero then
 * the pattern will be applied as many times as possible, the array can
 * have any length, and trailing empty strings will be discarded.

使用拆分在Java中使用分隔符选项卡“\ t”进行字符串解析

7 个答案: