从字符串中抓取随机单词序列?

时间:2015-12-30 03:20:43

标签: c# vb.net string split

我正在寻找从字符串中获取一定数量单词(按顺序)的最有效方法。

所以,让我说我有一段文字:

" Lorem Ipsum只是印刷和排版行业的虚拟文本。自16世纪以来,Lorem Ipsum一直是业界标准的虚拟文本,当时一台未知的打印机采用了类型的厨房,并将其拼凑成一个类型的样本书。它不仅存在了五个世纪,而且还延续了电子排版,基本保持不变。它在20世纪60年代随着包含Lorem Ipsum段落的Letraset表格的推出而普及,最近还推出了像Aldus PageMaker这样的桌面出版软件,包括Lorem Ipsum版本。"

我希望能够在段落中的随机位置抓取可变数量的单词。因此,如果需要5个单词,则某些输出的示例可能是:

  • "发布包含"的Letraset表
  • " Lorem Ipsum只是虚拟"
  • "只有五个世纪,而且还有#34;

这样做的最佳方法是什么?

5 个答案:

答案 0 :(得分:5)

按空格拆分数据以获取单词列表,然后找到一个随机位置来选择单词(从结尾起至少5个单词),然后再将单词连接在一起。

 public static void extract(char [][] arr, int row, int col){
    char [][] ex = new char [3][3];

    int rc = 0;
    int cc = 0;
    for(int i=row-1; i<row+2; i++){
        for(int j=col-1; j<col+2; j++){
            if(i < 0 || j < 0 || i >= arr.length || j >= arr[i].length){
                ex[rc][cc] = '?';
            }  else {
                ex[rc][cc] = arr[i][j];
            }
            cc++;
        }
        rc++;
        cc = 0;
    }

    for(int i=0; i<3;i++){
        for(int j=0; j<3; j++){
            System.out.print (ex[i][j] +" ");
        }
        System.out.println();
    }

}

示例输出:

import os, csv

rootDir = 'path'
items = {}
for dirName, subdirList, fileList in os.walk(rootDir, topdown=False):
    for fname in fileList:
        with open(os.path.join(dirName,fname),'rb') as f:
            reader = csv.reader(f,delimiter=',')
            for row in reader:
                try:
                    items[fname].append(row)
                except KeyError:
                    items[fname] = list()

print items

答案 1 :(得分:1)

对于顺序变化,我会这样做:

  1. 通过Array
  2. 将它们放入split(' ')个字词中
  3. Array
  4. 生成0到Random减5的长度的随机值
  5. 把它们放在一个句子中,给出一些空格。
  6. VB版+测试结果

    (这可能是你更感兴趣的)

    Imports System
    Imports System.Text
    
    Public Module Module1
        Public Sub Main()
            Dim str As String = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."
            Console.WriteLine(GrabRandSequence(str))
            Console.WriteLine(GrabRandSequence(str))
            Console.WriteLine(GrabRandSequence(str))
            Console.ReadKey()
        End Sub
    
        Public Function GrabRandSequence(inputstr As String)
            Try
                Dim words As String() = inputstr.Split(New Char() {" "c})
                Dim index As Integer
                index = CInt(Math.Floor((words.Length - 5) * Rnd()))
                Return [String].Join(" ", words, index, 5)
    
            Catch e As Exception
                Return e.ToString()
            End Try
        End Function    
    End Module
    

    结果

    enter image description here

    C#版

    string[] words = input.Split(' '); //Read 1.
    int val = (new Random()).Next(0, words.Length - 5); //Read 2.
    string result = string.Join(" ", words, val, 5); //Read 3. improved by Enigmativy's suggestion
    

    额外尝试

    对于随机变体,我会这样做:

    1. 清理所有不必要的字符(。等)
    2. List LINQ split(' ')
    3. 放入Distinct
    4. LINQ之间选择Lorem Lorem Lorem Lorem Lorem(可选,以避免List之类的结果)
    5. 生成5个不同的随机值,从0到Random大小List(不明显时重复拾取)
    6. 根据string input = "the input sentence, blabla"; input = input.Replace(",","").Replace(".",""); //Read 1. add as many replace as you want List<string> words = input.Split(' ').Distinct.ToList(); //Read 2. and 3. Random rand = new Random(); List<int> vals = new List<int>(); do { //Read 4. int val = rand.Next(0, words.Count); if (!vals.Contains(val)) vals.Add(val); } while (vals.Count < 5); string result = ""; for (int i = 0; i < 5; ++i) result += words[vals[i]] + (i == 4 ? "" : " "); //read 5. and 6.
    7. 中的随机值选择字词
    8. 把它们放在一个句子中,给出一些空格。
    9. 警告:句子可能没有任何意义!!

      C#版(仅限)

      result

      您的结果位于 pot-1_Sam [Sam is the word to be extracted] pot_444_Jack [Jack is the word to be extracted] pot_US-1_Sam [Sam is the word to be extracted] pot_RUS_444_Jack[Jack is the word to be extracted] pot_UK_3_Nick_Samuel[Nick_Samuel is the word to be extracted] pot_8_James_Baldwin[James_Baldwin is the word to be extracted] pot_8_Jack_Furleng_Derik[Jack_Furleng_Derik is the word to be extracted]

答案 2 :(得分:0)

        string input = "Your long sentence here";
        int noOfWords = 5;

        string[] arr = input.Split(' ');

        Random rnd = new Random();
        int start = rnd.Next(0, arr.Length - noOfWords);

        string output = "";
        for(int i = start; i < start + noOfWords; i++)
            output += arr[i] + " ";

        Console.WriteLine(output);

答案 3 :(得分:0)

string sentense = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.";
            string[] wordCollections = sentense.Split(' ');
            Random rnd = new Random();
            int randomPos=rnd.Next(0, wordCollections.Length);
            string grabAttempt1 = String.Join(" ", wordCollections.ToArray(), randomPos, 5);
// Gives you a random string of 5 words             
            randomPos = rnd.Next(0, wordCollections.Length);
            string grabAttempt2 = String.Join(" ", wordCollections, randomPos, 5);
// Gives you another random string of 5 words

答案 4 :(得分:0)

这可能会为你做到这一点

.close