Question

我一直无法找到从Guardian API中为我的论文提取特定文本信息的方法。我已经设法将我的所有文本都放到了Python上，但是你如何清理它才能说出来，只是新闻文章的头条新闻？

这是API结果的片段，我想从中提取信息：

public static double calcFeetAndInchesToCentimetres(double feet, double inches) {
    double centimetres = 0.0;
    if (feet >= 0 && inches >= 0 && inches <= 12) {
        double footToInches = feet * 12;
        centimetres = (inches + footToInches) * 2.54;
        System.out.println("The value in centimetres is " + centimetres + " cm.");
    } else {
        centimetres = -1;
    }
    return centimetres;
}


public static double calcFeetAndInchesToCentimetres(double inches) {
    double centimeters = 0;
    if (inches >= 0) {
        double inchesToFeet = inches / 12;
        double inchesRemain = inches - (inchesToFeet * 12);
        centimeters = calcFeetAndInchesToCentimetres(inchesToFeet, inchesRemain);

    } else {
        centimeters = -1;
    }
    return centimeters; //Here
}

Answer 1

希望OP将使用过的代码添加到问题中。

python中的一个解决方案是，无论你得到什么（来自请求模块提供的方法？）都将是已经深度嵌套的结构，你可以很好地索引或者你可以很容易地将它映射到这些结构（通过json.loads （the_string_you_displayed）。

样品：

d = json.loads(the_string_you_displayed)
head_line = d['response']['results'][0]['webTitle']

将值存入标题，该标题存储在响应条目值的结果“array”（索引0）中找到的第一个dict中。（现在问题已经更新，完整路径可见）

如果我正确地读取了给出的样本片段，并且在复制和粘贴过程中它已被剪切，因为给定的样本（按原样）是无效的JSON。

如果文本不代表有效的JSON文本，它将取决于通过子字符串或模式匹配筛选文本，并且可能非常脆弱......

更新：假设完整的响应结构存储在名为data的变量中：

result_seq = data['response']['results']  # yields a list here
headlines = [result['webTitle'] for result in result_seq]

最后一行的工作原理如下：这是一个列表理解，通过在每个字典中选择关键webTitle的值，从所有条目中创建一个列表，从而得到result_seq。

明确的for循环解决方案选择它们将是：

result_seq = data['response']['results'] 
headlines = []
for result in result_seq:
    headlines.append(result['webTitle'])

如果没有关键的webTitle等，这不会检查结果dicts之类的错误，但是Python会引发一个匹配的异常，如果有人喜欢将处理包装在try：except块或者希望最好的情况下，可以决定。 ..

清洁API结果，以获得新闻文章的头条新闻？

1 个答案: