如何在python中找到可能跟随大字符串中单个单词的单词。

时间:2018-04-25 11:53:57

标签: python python-3.x python-2.7

str1=""”Python is a widely used high-level programming language for general-purpose programming, created by Guido van Rossum and first released in 1991. An interpreted language, Python has a design philosophy which emphasizes code readability (notably using whitespace indentation to delimit code blocks rather than curly braces or keywords), and a syntax which allows programmers to express concepts in fewer lines of code than possible in languages such as C++ or Java. The language provides constructs intended to enable writing clear programs on both a small and large scale .Python features a dynamic type system and automatic memory management and supports multiple programming paradigms, including object-oriented, imperative, functional programming, and procedural styles. It has a large and comprehensive standard library. Python interpreters are available for many operating systems, allowing Python code to run on a wide variety of systems. CPython, the reference implementation of Python, is open source software and has a community-based development model, as do nearly all of its variant implementations. CPython is managed by the non-profit Python Software Foundation."""..
输出应该是:

python : [is, has, features, interpreters, code, Software

2 个答案:

答案 0 :(得分:0)

您可以使用正则表达式查找单词python之后的单词,例如

public static void main(String[] args) {
    String example = "(((1+2))-((5+6))-((10+20))-((6-18))+((9+10)))";


    List<String> firstList = new ArrayList<>();
    Pattern pattern = Pattern.compile("\\(|\\)|\\+|\\-|\\*|\\\\|\\d+"); // the back slashes are used for escaping
    Matcher matcher = pattern.matcher(example);
    while (matcher.find()) {
        firstList.add(matcher.group());
    }

    // second way:
    List<String> secondList = Arrays.asList(
            example.split("(?<!\\d)(?!\\d)|(?<=\\d)(?!\\d)|(?<!\\d)(?=\\d)"));

    // third way
    List<String> thirdList = new ArrayList<>();
    char[] chars = example.toCharArray();
    for (int index = 0; index < chars.length; ) {
        if (!Character.isDigit(chars[index])) {
            thirdList.add(String.valueOf(chars[index])); // put into list if not digit
            index++;
        } else {
            StringBuilder stringBuilder = new StringBuilder();
            while (Character.isDigit(chars[index])) {  // loop to get a complete number
                stringBuilder.append(chars[index]);
                index++;
            }
            thirdList.add(stringBuilder.toString());
        }
    }
}

由于此列表可能包含重复项,因此如果您想要一组唯一字词,则可以创建>>> import re >>> re.findall(r'Python (\w+)', s) ['is', 'has', 'features', 'interpreters', 'code', 'is', 'Software']

set

答案 1 :(得分:0)

您可以使用生成器:

def yielder(x, value='Python'):
    match = False
    for word in x.split():
        if match == True:
            yield word
            match = False
        if word == value:
            match = True

res = list(yielder(str1))

['is', 'has', 'interpreters', 'code', 'Software']

这种方法的好处是分割后它是懒惰的。对于长字符串,您可以在迭代时提取结果。

要了解详情,请查看以下内容:

  • 生成器:yield语句如何工作
  • 拆分字符串:str.split如何工作