我需要我的Java程序采用如下字符串:
"This is a sample sentence."
并将其转换为字符串数组,如:
{"this","is","a","sample","sentence"}
没有句号或标点符号(最好)。顺便说一下,字符串输入总是一个句子。
有没有一种简单的方法可以做到这一点,我没有看到?或者我们是否真的必须经常搜索空格并从空格之间的区域(这些是单词)创建新的字符串?
答案 0 :(得分:54)
String.split()会完成你想要的大部分工作。然后,您可能需要循环显示单词以删除任何标点符号。
例如:
String s = "This is a sample sentence.";
String[] words = s.split("\\s+");
for (int i = 0; i < words.length; i++) {
// You may want to check for a non-word character before blindly
// performing a replacement
// It may also be necessary to adjust the character class
words[i] = words[i].replaceAll("[^\\w]", "");
}
答案 1 :(得分:18)
现在,这可以通过split
完成,因为它需要正则表达式:
String s = "This is a sample sentence with []s.";
String[] words = s.split("\\W+");
这会将单词表示为:{"this","is","a","sample","sentence", "s"}
\\W+
将匹配所有出现一次或多次的非字母字符。所以没有必要更换。您也可以检查其他模式。
答案 2 :(得分:12)
您可以使用BreakIterator.getWordInstance
查找字符串中的所有字词。
public static List<String> getWords(String text) {
List<String> words = new ArrayList<String>();
BreakIterator breakIterator = BreakIterator.getWordInstance();
breakIterator.setText(text);
int lastIndex = breakIterator.first();
while (BreakIterator.DONE != lastIndex) {
int firstIndex = lastIndex;
lastIndex = breakIterator.next();
if (lastIndex != BreakIterator.DONE && Character.isLetterOrDigit(text.charAt(firstIndex))) {
words.add(text.substring(firstIndex, lastIndex));
}
}
return words;
}
测试:
public static void main(String[] args) {
System.out.println(getWords("A PT CR M0RT BOUSG SABN NTE TR/GB/(G) = RAND(MIN(XXX, YY + ABC))"));
}
输出继电器:
[A, PT, CR, M0RT, BOUSG, SABN, NTE, TR, GB, G, RAND, MIN, XXX, YY, ABC]
答案 3 :(得分:11)
答案 4 :(得分:7)
您可以使用此常规表达式
分割您的字符串String l = "sofia, malgré tout aimait : la laitue et le choux !" <br/>
l.split("[[ ]*|[,]*|[\\.]*|[:]*|[/]*|[!]*|[?]*|[+]*]+");
答案 5 :(得分:5)
我能想到的最简单和最好的答案是使用在java字符串上定义的以下方法 -
String[] split(String regex)
就这样做“这是一个例句”.split(“”)。因为它需要一个正则表达式,你也可以做更复杂的分割,包括删除不需要的标点符号和其他这样的字符。
答案 6 :(得分:5)
尝试使用以下内容:
String str = "This is a simple sentence";
String[] strgs = str.split(" ");
这将使用空格作为分割点在字符串数组的每个索引处创建子字符串。
答案 7 :(得分:4)
使用string.replace(".", "").replace(",", "").replace("?", "").replace("!","").split(' ')
将代码拆分为没有句点,逗号,问号或感叹号的数组。您可以根据需要添加/删除任意数量的替换呼叫。
答案 8 :(得分:3)
试试这个:
String[] stringArray = Pattern.compile("ian").split(
"This is a sample sentence"
.replaceAll("[^\\p{Alnum}]+", "") //this will remove all non alpha numeric chars
);
for (int j=0; i<stringArray .length; j++) {
System.out.println(i + " \"" + stringArray [j] + "\"");
}
答案 9 :(得分:2)
以下是一段代码片段,它将句子分成单词并给出其计数。
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
public class StringToword {
public static void main(String[] args) {
String s="a a a A A";
String[] splitedString=s.split(" ");
Map m=new HashMap();
int count=1;
for(String s1 :splitedString){
count=m.containsKey(s1)?count+1:1;
m.put(s1, count);
}
Iterator<StringToword> itr=m.entrySet().iterator();
while(itr.hasNext()){
System.out.println(itr.next());
}
}
}
答案 10 :(得分:1)
我已经在某处发布了这个答案,我会再次在这里做。此版本不使用任何主要的内置方法。 你得到了char数组,将它转换为String。希望它有所帮助!
import java.util.Scanner;
public class SentenceToWord
{
public static int getNumberOfWords(String sentence)
{
int counter=0;
for(int i=0;i<sentence.length();i++)
{
if(sentence.charAt(i)==' ')
counter++;
}
return counter+1;
}
public static char[] getSubString(String sentence,int start,int end) //method to give substring, replacement of String.substring()
{
int counter=0;
char charArrayToReturn[]=new char[end-start];
for(int i=start;i<end;i++)
{
charArrayToReturn[counter++]=sentence.charAt(i);
}
return charArrayToReturn;
}
public static char[][] getWordsFromString(String sentence)
{
int wordsCounter=0;
int spaceIndex=0;
int length=sentence.length();
char wordsArray[][]=new char[getNumberOfWords(sentence)][];
for(int i=0;i<length;i++)
{
if(sentence.charAt(i)==' ' || i+1==length)
{
wordsArray[wordsCounter++]=getSubString(sentence, spaceIndex,i+1); //get each word as substring
spaceIndex=i+1; //increment space index
}
}
return wordsArray; //return the 2 dimensional char array
}
public static void main(String[] args)
{
System.out.println("Please enter the String");
Scanner input=new Scanner(System.in);
String userInput=input.nextLine().trim();
int numOfWords=getNumberOfWords(userInput);
char words[][]=new char[numOfWords+1][];
words=getWordsFromString(userInput);
System.out.println("Total number of words found in the String is "+(numOfWords));
for(int i=0;i<numOfWords;i++)
{
System.out.println(" ");
for(int j=0;j<words[i].length;j++)
{
System.out.print(words[i][j]);//print out each char one by one
}
}
}
}
答案 11 :(得分:1)
另一种方法是StringTokenizer。 例如: -
public static void main(String[] args) {
String str = "This is a sample string";
StringTokenizer st = new StringTokenizer(str," ");
String starr[]=new String[st.countTokens()];
while (st.hasMoreElements()) {
starr[i++]=st.nextElement();
}
}
答案 12 :(得分:1)
string.replaceAll()无法正确使用与预定义不同的语言环境。至少在jdk7u10。
此示例使用windows cyrillic charset CP1251
从textfile创建一个单词字典 public static void main (String[] args) {
String fileName = "Tolstoy_VoinaMir.txt";
try {
List<String> lines = Files.readAllLines(Paths.get(fileName),
Charset.forName("CP1251"));
Set<String> words = new TreeSet<>();
for (String s: lines ) {
for (String w : s.split("\\s+")) {
w = w.replaceAll("\\p{Punct}","");
words.add(w);
}
}
for (String w: words) {
System.out.println(w);
}
} catch (Exception e) {
e.printStackTrace();
}
答案 13 :(得分:0)
您可以使用简单的以下代码
String str= "This is a sample sentence.";
String[] words = str.split("[[ ]*|[//.]]");
for(int i=0;i<words.length;i++)
System.out.print(words[i]+" ");
答案 14 :(得分:0)
在这里,大多数答案都将字符串转换为字符串数组。但是通常我们使用List,所以更有用的是-
wait-for-it
答案 15 :(得分:0)
这是简单的C ++代码的解决方案,没有任何高级功能,使用DMA分配动态字符串数组,然后将数据放入数组中,直到找到一个空白为止。 请参考下面的代码并附带注释。 希望对您有所帮助。
#include<bits/stdc++.h>
using namespace std;
int main()
{
string data="hello there how are you"; // a_size=5, char count =23
//getline(cin,data);
int count=0; // initialize a count to count total number of spaces in string.
int len=data.length();
for (int i = 0; i < (int)data.length(); ++i)
{
if(data[i]==' ')
{
++count;
}
}
//declare a string array +1 greater than the size
// num of space in string.
string* str = new string[count+1];
int i, start=0;
for (int index=0; index<count+1; ++index) // index array to increment index of string array and feed data.
{ string temp="";
for ( i = start; i <len; ++i)
{
if(data[i]!=' ') //increment temp stored word till you find a space.
{
temp=temp+data[i];
}else{
start=i+1; // increment i counter to next to the space
break;
}
}str[index]=temp;
}
//print data
for (int i = 0; i < count+1; ++i)
{
cout<<str[i]<<" ";
}
return 0;
}
答案 16 :(得分:0)
这应该有帮助,
<svg id= "logo" width="918" height="114" viewBox="0 0 918 114" fill="none" xmlns="http://www.w3.org/2000/svg">
<mask id="path-1-outside-1" maskUnits="userSpaceOnUse" x="0.0880127" y="0.47998" width="918" height="113" fill="black">
<rect fill="white" x="0.0880127" y="0.47998" width="918" height="113"/>
<path fill="" d="M84 107H70.896L18.192 27.08V107H5.08801V6.48798H18.192L70.896 86.264V6.48798H84V107Z"/>
<path d="M119.442 17.288V50.84H156.018V61.64H119.442V96.2H160.338V107H106.338V6.48798H160.338V17.288H119.442Z"/>
<path d="M211.414 6.63198C222.358 6.63198 231.814 8.69598 239.782 12.824C247.846 16.856 253.99 22.664 258.214 30.248C262.534 37.832 264.694 46.76 264.694 57.032C264.694 67.304 262.534 76.232 258.214 83.816C253.99 91.304 247.846 97.064 239.782 101.096C231.814 105.032 222.358 107 211.414 107H180.166V6.63198H211.414ZM211.414 96.2C224.374 96.2 234.262 92.792 241.078 85.976C247.894 79.064 251.302 69.416 251.302 57.032C251.302 44.552 247.846 34.808 240.934 27.8C234.118 20.792 224.278 17.288 211.414 17.288H193.27V96.2H211.414Z"/>
<path d="M295.083 6.63198V107H281.979V6.63198H295.083Z"/>
<path d="M376.744 84.68H332.968L324.904 107H311.08L347.368 7.20798H362.488L398.632 107H384.808L376.744 84.68ZM373 74.024L354.856 23.336L336.712 74.024H373Z"/>
<path d="M493.359 107H480.255L427.551 27.08V107H414.447V6.48798H427.551L480.255 86.264V6.48798H493.359V107Z"/>
<path d="M604.344 55.304C607.992 55.88 611.304 57.368 614.28 59.768C617.352 62.168 619.752 65.144 621.48 68.696C623.304 72.248 624.216 76.04 624.216 80.072C624.216 85.16 622.92 89.768 620.328 93.896C617.736 97.928 613.944 101.144 608.952 103.544C604.056 105.848 598.248 107 591.528 107H554.088V6.63198H590.088C596.904 6.63198 602.712 7.78398 607.512 10.088C612.312 12.296 615.912 15.32 618.312 19.16C620.712 23 621.912 27.32 621.912 32.12C621.912 38.072 620.28 43.016 617.016 46.952C613.848 50.792 609.624 53.576 604.344 55.304ZM567.192 49.976H589.224C595.368 49.976 600.12 48.536 603.48 45.656C606.84 42.776 608.52 38.792 608.52 33.704C608.52 28.616 606.84 24.632 603.48 21.752C600.12 18.872 595.272 17.432 588.936 17.432H567.192V49.976ZM590.376 96.2C596.904 96.2 601.992 94.664 605.64 91.592C609.288 88.52 611.112 84.248 611.112 78.776C611.112 73.208 609.192 68.84 605.352 65.672C601.512 62.408 596.376 60.776 589.944 60.776H567.192V96.2H590.376Z"/>
<path d="M655.505 96.344H690.641V107H642.401V6.63198H655.505V96.344Z"/>
<path d="M750.061 108.008C740.749 108.008 732.253 105.848 724.573 101.528C716.893 97.112 710.797 91.016 706.285 83.24C701.869 75.368 699.661 66.536 699.661 56.744C699.661 46.952 701.869 38.168 706.285 30.392C710.797 22.52 716.893 16.424 724.573 12.104C732.253 7.68798 740.749 5.47998 750.061 5.47998C759.469 5.47998 768.013 7.68798 775.693 12.104C783.373 16.424 789.421 22.472 793.837 30.248C798.253 38.024 800.461 46.856 800.461 56.744C800.461 66.632 798.253 75.464 793.837 83.24C789.421 91.016 783.373 97.112 775.693 101.528C768.013 105.848 759.469 108.008 750.061 108.008ZM750.061 96.632C757.069 96.632 763.357 95 768.925 91.736C774.589 88.472 779.005 83.816 782.173 77.768C785.437 71.72 787.069 64.712 787.069 56.744C787.069 48.68 785.437 41.672 782.173 35.72C779.005 29.672 774.637 25.016 769.069 21.752C763.501 18.488 757.165 16.856 750.061 16.856C742.957 16.856 736.621 18.488 731.053 21.752C725.485 25.016 721.069 29.672 717.805 35.72C714.637 41.672 713.053 48.68 713.053 56.744C713.053 64.712 714.637 71.72 717.805 77.768C721.069 83.816 725.485 88.472 731.053 91.736C736.717 95 743.053 96.632 750.061 96.632Z"/>
<path d="M893.792 35.72C891.008 29.864 886.976 25.352 881.696 22.184C876.416 18.92 870.272 17.288 863.264 17.288C856.256 17.288 849.92 18.92 844.256 22.184C838.688 25.352 834.272 29.96 831.008 36.008C827.84 41.96 826.256 48.872 826.256 56.744C826.256 64.616 827.84 71.528 831.008 77.48C834.272 83.432 838.688 88.04 844.256 91.304C849.92 94.472 856.256 96.056 863.264 96.056C873.056 96.056 881.12 93.128 887.456 87.272C893.792 81.416 897.488 73.496 898.544 63.512H858.512V52.856H912.512V62.936C911.744 71.192 909.152 78.776 904.736 85.688C900.32 92.504 894.512 97.928 887.312 101.96C880.112 105.896 872.096 107.864 863.264 107.864C853.952 107.864 845.456 105.704 837.776 101.384C830.096 96.968 824 90.872 819.488 83.096C815.072 75.32 812.864 66.536 812.864 56.744C812.864 46.952 815.072 38.168 819.488 30.392C824 22.52 830.096 16.424 837.776 12.104C845.456 7.68798 853.952 5.47998 863.264 5.47998C873.92 5.47998 883.328 8.11998 891.488 13.4C899.744 18.68 905.744 26.12 909.488 35.72H893.792Z"/>
</mask>
<path d="M84 107H70.896L18.192 27.08V107H5.08801V6.48798H18.192L70.896 86.264V6.48798H84V107Z" stroke="white" stroke-width="10" mask="url(#path-1-outside-1)"/>
<path d="M119.442 17.288V50.84H156.018V61.64H119.442V96.2H160.338V107H106.338V6.48798H160.338V17.288H119.442Z" stroke="white" stroke-width="10" mask="url(#path-1-outside-1)"/>
<path d="M211.414 6.63198C222.358 6.63198 231.814 8.69598 239.782 12.824C247.846 16.856 253.99 22.664 258.214 30.248C262.534 37.832 264.694 46.76 264.694 57.032C264.694 67.304 262.534 76.232 258.214 83.816C253.99 91.304 247.846 97.064 239.782 101.096C231.814 105.032 222.358 107 211.414 107H180.166V6.63198H211.414ZM211.414 96.2C224.374 96.2 234.262 92.792 241.078 85.976C247.894 79.064 251.302 69.416 251.302 57.032C251.302 44.552 247.846 34.808 240.934 27.8C234.118 20.792 224.278 17.288 211.414 17.288H193.27V96.2H211.414Z" stroke="white" stroke-width="10" mask="url(#path-1-outside-1)"/>
<path d="M295.083 6.63198V107H281.979V6.63198H295.083Z" stroke="white" stroke-width="10" mask="url(#path-1-outside-1)"/>
<path d="M376.744 84.68H332.968L324.904 107H311.08L347.368 7.20798H362.488L398.632 107H384.808L376.744 84.68ZM373 74.024L354.856 23.336L336.712 74.024H373Z" stroke="white" stroke-width="10" mask="url(#path-1-outside-1)"/>
<path d="M493.359 107H480.255L427.551 27.08V107H414.447V6.48798H427.551L480.255 86.264V6.48798H493.359V107Z" stroke="white" stroke-width="10" mask="url(#path-1-outside-1)"/>
<path d="M604.344 55.304C607.992 55.88 611.304 57.368 614.28 59.768C617.352 62.168 619.752 65.144 621.48 68.696C623.304 72.248 624.216 76.04 624.216 80.072C624.216 85.16 622.92 89.768 620.328 93.896C617.736 97.928 613.944 101.144 608.952 103.544C604.056 105.848 598.248 107 591.528 107H554.088V6.63198H590.088C596.904 6.63198 602.712 7.78398 607.512 10.088C612.312 12.296 615.912 15.32 618.312 19.16C620.712 23 621.912 27.32 621.912 32.12C621.912 38.072 620.28 43.016 617.016 46.952C613.848 50.792 609.624 53.576 604.344 55.304ZM567.192 49.976H589.224C595.368 49.976 600.12 48.536 603.48 45.656C606.84 42.776 608.52 38.792 608.52 33.704C608.52 28.616 606.84 24.632 603.48 21.752C600.12 18.872 595.272 17.432 588.936 17.432H567.192V49.976ZM590.376 96.2C596.904 96.2 601.992 94.664 605.64 91.592C609.288 88.52 611.112 84.248 611.112 78.776C611.112 73.208 609.192 68.84 605.352 65.672C601.512 62.408 596.376 60.776 589.944 60.776H567.192V96.2H590.376Z" stroke="white" stroke-width="10" mask="url(#path-1-outside-1)"/>
<path d="M655.505 96.344H690.641V107H642.401V6.63198H655.505V96.344Z" stroke="white" stroke-width="10" mask="url(#path-1-outside-1)"/>
<path d="M750.061 108.008C740.749 108.008 732.253 105.848 724.573 101.528C716.893 97.112 710.797 91.016 706.285 83.24C701.869 75.368 699.661 66.536 699.661 56.744C699.661 46.952 701.869 38.168 706.285 30.392C710.797 22.52 716.893 16.424 724.573 12.104C732.253 7.68798 740.749 5.47998 750.061 5.47998C759.469 5.47998 768.013 7.68798 775.693 12.104C783.373 16.424 789.421 22.472 793.837 30.248C798.253 38.024 800.461 46.856 800.461 56.744C800.461 66.632 798.253 75.464 793.837 83.24C789.421 91.016 783.373 97.112 775.693 101.528C768.013 105.848 759.469 108.008 750.061 108.008ZM750.061 96.632C757.069 96.632 763.357 95 768.925 91.736C774.589 88.472 779.005 83.816 782.173 77.768C785.437 71.72 787.069 64.712 787.069 56.744C787.069 48.68 785.437 41.672 782.173 35.72C779.005 29.672 774.637 25.016 769.069 21.752C763.501 18.488 757.165 16.856 750.061 16.856C742.957 16.856 736.621 18.488 731.053 21.752C725.485 25.016 721.069 29.672 717.805 35.72C714.637 41.672 713.053 48.68 713.053 56.744C713.053 64.712 714.637 71.72 717.805 77.768C721.069 83.816 725.485 88.472 731.053 91.736C736.717 95 743.053 96.632 750.061 96.632Z" stroke="white" stroke-width="10" mask="url(#path-1-outside-1)"/>
<path d="M893.792 35.72C891.008 29.864 886.976 25.352 881.696 22.184C876.416 18.92 870.272 17.288 863.264 17.288C856.256 17.288 849.92 18.92 844.256 22.184C838.688 25.352 834.272 29.96 831.008 36.008C827.84 41.96 826.256 48.872 826.256 56.744C826.256 64.616 827.84 71.528 831.008 77.48C834.272 83.432 838.688 88.04 844.256 91.304C849.92 94.472 856.256 96.056 863.264 96.056C873.056 96.056 881.12 93.128 887.456 87.272C893.792 81.416 897.488 73.496 898.544 63.512H858.512V52.856H912.512V62.936C911.744 71.192 909.152 78.776 904.736 85.688C900.32 92.504 894.512 97.928 887.312 101.96C880.112 105.896 872.096 107.864 863.264 107.864C853.952 107.864 845.456 105.704 837.776 101.384C830.096 96.968 824 90.872 819.488 83.096C815.072 75.32 812.864 66.536 812.864 56.744C812.864 46.952 815.072 38.168 819.488 30.392C824 22.52 830.096 16.424 837.776 12.104C845.456 7.68798 853.952 5.47998 863.264 5.47998C873.92 5.47998 883.328 8.11998 891.488 13.4C899.744 18.68 905.744 26.12 909.488 35.72H893.792Z" stroke="white" stroke-width="10" mask="url(#path-1-outside-1)"/>
</svg>
这将创建一个数组,其中元素为字符串,并用“”分隔。