为URL解析准备字符串(Python)

时间:2015-12-06 15:09:00

标签: python json csv url

首先发布在这里,希望我能提供所需的信息,以指导我朝着正确的方向发展。

我是Python的新手,我正在开发一个脚本,它基本上从一个CSV文件中获取一个字符串,通过URL将其提供给API,该URL返回一些JSON,而后者又存储在另一个CSV。

问题在于,只有10%的数据被提供给API作为所需的JSON。其余的都返回一个错误,由于我的代码中的异常,它在output.csv中存储为“-----”。

Python-version:2.7.10

input.csv的长度:74826行

以下是代码:

import csv
import urllib2
import json
import datetime
import re

# Sociallytic API-key - reeeaally secret :-)
api_key = 'xxxxxxx' # Not the real key

# File destinations
input_file = 'input.csv'
output_file = 'output.csv'

def prepareRequest(comment):
    # Preparing string for parsing to URL
    space = '%20'

    comment = re.sub('[^A-ZÆØÅa-zæøå0-9]+', ' ', comment)
    comment = comment.replace('\t', ' ')
    comment = comment.replace('\n', ' ')
    comment = comment.replace(' ', space)

    return comment

def getAPIFeedback(ready_comment):
    # Construct the request
    preparedRequest = urllib2.quote('"""http://api.sociallytic.dk/?key=xxxxxx&txt=' + ready_comment + '"""')

    request = urllib2.Request(preparedRequest)

    # Making the request
    json_reply = urllib2.urlopen(request).read()
    loaded_json = json_reply
    loaded_json = json.loads(json_reply)

    return loaded_json


def processAPIFeeback(loaded_json):
    # Preparing comment word count
    word_count = loaded_json['count_of_words']

    # Preparing comment sentiment score - SUM OF INDIVIDUAL WORD SENTIMENT SCORES
    sent_score = loaded_json['sentiment_score']

    # Preparing comment sentiment score words - SENTIMENT SCORE / SQ(COUNT OF WORDS WITH SENTIMENT SCORE)
    sent_score_words = loaded_json['sentiment_score_words']

    # Preparing brand sentiment - NEGATIVE, NEUTRAL OR POSITIVE
    brand_sentiment = loaded_json['sentiment']

    return (word_count, sent_score, sent_score_words, brand_sentiment)


def mainFunction(input_file, output_file, api_key):
    # Create new CSV file
    with open(output_file, 'wb') as file:
        w = csv.writer(file, delimiter=';')
        w.writerow(['status_id', 'word_count', 'sent_score', 'sent_score_words', 'brand_sentiment'])

        # Counter for returning status to user
        counter = 0

        # Open and read the CSV file
        f = open(input_file, 'r')
        csv_f = csv.reader(f, delimiter=';')

        # Skip status_id header
        next(csv_f, None)

        # Displayed loading text
        print 'Processing... Please wait...'

        # Getting data from CSV file
        for row in csv_f:
            # For each iteration the counter increases with 1
            counter += 1

            # Storing status_id
            id = row[0]

            # Storing comment
            comment = row[1]


            ready_comment = prepareRequest(comment)

            # Making the API-request and writing result to output.csv
            try:
                # Parsing comment to API
                loaded_json = getAPIFeedback(ready_comment)

                # Writing status_id and API feedback to CSV file
                id_feedback = (id,) + processAPIFeeback(loaded_json)

                w.writerow(id_feedback)

            except Exception:
                w.writerow('-----')

            # Output counter to user for each 100 comments processed
            if counter % 100 == 0:
                print counter, 'comments processed.'

        # Closing the CSV files
        f.close()

if __name__ == '__main__':
    mainFunction(input_file, output_file, api_key)

在我的脚本中返回错误的请求示例: http://api.sociallytic.dk/?key=xxxxxxx&txt=%22Haha%20Kenneth,%20det%20er%20ellers%20et%20meget%20godt%20bud%20%E2%80%93%20det%20kunne%20ogs%C3%A5%20v%C3%A6re%20en%20kat:)%20Buddene%20er%20der%20helt%20sikkert%20mange%20af%E2%80%A6%20Jeg%20er%20dog%20ret%20sikker%20p%C3%A5,%20at%20Jylland%20er%20det%20rigtige%20svar;)%22

但是当我将请求直接输入浏览器时,结果会按原样返回。

我在这里做错了什么?

0 个答案:

没有答案