我正在使用Java读取一个大型json文件,并发布我的localhost的每一行,然后获取JSON并将其作为对象读取,然后使用MySQL将对象的一部分存储在该数据库中。
这是一个非常缓慢的过程。
我该如何优化它?
<?php
error_reporting(0);
@ini_set('display_errors', 0);
$json = $_POST['data'];
if(!empty($json)){
$obj = json_decode($json);
$user_id = $obj->interaction->author->id;
$user_link = $obj->interaction->author->link;
$name = $obj->interaction->author->name;
$user_name = $obj->interaction->author->username;
$user_gender = $obj->demographic->gender;
$user_language = $obj->twitter->lang;
$user_image = $obj->interaction->author->avatar;
$user_klout = $obj->klout->score;
$user_confidence = $obj->language->confidence;
$user_desc = $obj->twitter->user->description;
$user_timezone = $obj->twitter->user->time_zone;
$user_tweet_count = $obj->twitter->user->statuses_count;
$user_followers_count = $obj->twitter->user->followers_count;
$user_friends_count = $obj->twitter->user->friends_count;
$user_location = $obj->twitter->user->location;
$user_created_at = $obj->twitter->user->created_at;
$tweet_id = $obj->twitter->id;
$tweet_text = $obj->interaction->content;
$tweet_link = $obj->interaction->link;
$tweet_created_at = $obj->interaction->created_at;
$tweet_location = $obj->twitter->user->location;
//$tweet_geo_lat = $obj->interaction->geo->latitude;
//$tweet_geo_long = $obj->interaction->geo->longitude;
$con = mysqli_connect("localhost","root","", "cohort");
// Check connection
if (mysqli_connect_errno()) {
echo "Failed to connect to MySQL: " . mysqli_connect_error();
}
$sql = "INSERT INTO tweeters (user_id, screen_name, name, profile_image_url, location, url,
description, created_at, followers_count,
friends_count, statuses_count, time_zone,
last_update, klout, confidence, gender
)
VALUES ('$user_id', '$user_name','$name',
'$user_image', '$user_location', '$user_link',
'$user_desc', '$user_created_at', '$user_followers_count',
'$user_friends_count', '$user_tweet_count', '$user_timezone',
'', '$user_klout', '$user_confidence', '$user_gender' )";
if (!mysqli_query($con,$sql)) {
//die('Error: ' . mysqli_error($con));
}
$sql = "INSERT INTO search_tweets (tweet_id, tweet_text, created_at_date,
created_at_time, location, geo_lat,
geo_long, user_id, is_rt)
VALUES ('$tweet_id', '$tweet_text','',
'$tweet_created_at', '$tweet_location', '',
'', '$user_id', '')";
if (!mysqli_query($con,$sql)) {
//die('Error: ' . mysqli_error($con));
}
mysqli_close($con);
echo json_encode(array("id" => $user_id ));
}
?>
爪哇:
String inputfile = "D:\\Datasift\\Tweets.json"; // Source File Name.
double nol = 200000; // No. of lines to be split and saved in each output file.
File file = new File(inputfile);
Scanner scanner = new Scanner(file);
int count = 0;
System.out.println("Storing file in stack");
int may = 0, june = 0, just_june=0, july = 0;
BufferedReader br = null;
BufferedReader in = new BufferedReader (new InputStreamReader (new ReverseLineInputStream(file)));
while(true) {
String line = in.readLine();
if (line == null) {
break;
}
//System.out.println("X:" + line);
// Send POST output.
URL url1;
URLConnection urlConn;
DataOutputStream printout;
DataInputStream input;
// URL of CGI-Bin script.
url1 = new URL ("http://localhost/json/");
// URL connection channel.
urlConn = url1.openConnection();
// Let the run-time system (RTS) know that we want input.
urlConn.setDoInput (true);
// Let the RTS know that we want to do output.
urlConn.setDoOutput (true);
// No caching, we want the real thing.
urlConn.setUseCaches (false);
// Specify the content type.
urlConn.setRequestProperty
("Content-Type", "application/x-www-form-urlencoded");
printout = new DataOutputStream (urlConn.getOutputStream ());
String content =
"data=" + URLEncoder.encode (line);
printout.writeBytes (content);
printout.flush ();
printout.close ();
// Get response data.
input = new DataInputStream (urlConn.getInputStream ());
String str;
while (null != ((str = input.readLine()))){
//System.out.println (str);
}
input.close ();
}
System.out.println("Lines in the file: " + count);
答案 0 :(得分:0)
我不想以任何方式进行搜索,但为什么不用PHP来阅读文件呢?
答案 1 :(得分:0)
如果您多次重复此过程,则一种方法是使用多个数据集。
所以,不要执行INSERT INTO表(字段,字段)VALUES(值,值)并循环多次,你会做$ ins =“INSERT INTO table(field,field)VALUES”;然后在你的foreach循环中(或每次调用代码时)构建一个数组$ ins_values [] =“(escaped_value,escaped_value)”;然后运行查询$ ins.implode(',',$ ins_values)。
在这种情况下,Mysql会运行得更快,但要注意Mysql在max_allowed_packet上设置数据限制,这可能需要根据情况进行调整。
希望有帮助,我已正确理解你的问题。