如何将aws公共数据集导入mysql实例?

时间:2016-04-20 19:27:11

标签: mysql ubuntu amazon-web-services amazon-ec2

我正在尝试在Amazon AWS中使用名为Twilio/Wigle.net Street Vector Data Set的公共数据集。此数据集包含美国街道名称和地址范围的数据。它的大小约为10 GiB。当我们看一下linux时,它看起来如下:

ubuntu@ip-172-31-xxx-xxx:/data-us-street$ pwd
/data-us-street
ubuntu@ip-172-31-xxx-xxx:/data-us-street$ ll
total 20576
drwxr-xr-x  6   27 sudo     4096 May 19  2009 ./
drwxr-xr-x 23 root root     4096 Apr 20 18:10 ../
-rw-r--r--  1 root root     8339 May 19  2009 README
drwx------  2   27 sudo     4096 Mar 18  2009 addresses/
-rw-rw----  1   27 sudo  5242880 Mar 18  2009 ib_logfile0
-rw-rw----  1   27 sudo  5242880 Mar  8  2009 ib_logfile1
-rw-rw----  1   27 sudo 10485760 Mar 18  2009 ibdata1
drwx------  2   27 sudo    16384 Mar  8  2009 lost+found/
drwx------  2   27 sudo     4096 Mar  8  2009 mysql/
-rw-rw----  1   27 sudo      117 Mar 18  2009 mysql-bin.000001
-rw-rw----  1   27 sudo       19 Mar 18  2009 mysql-bin.index
drwx------  2   27 sudo     4096 Mar  8  2009 test/

为了使用它的数据,我想将它链接到同一主机中的MySQL数据库。有人能告诉我怎么做吗?

我尝试了什么

我尝试覆盖位于datadir的mysql存储目录/etc/mysql/my.cnf,但有以下更改:

#datadir        = /var/lib/mysql
datadir         = /data-us-street

我停止了服务器,更改为值然后重新启动MySQL服务器。但是,它不起作用。

自述

/*
    Copyright (c) 2009 Twilio, Inc.

    Permission is hereby granted, free of charge, to any person
    obtaining a copy of this software and associated documentation
    files (the "Software"), to deal in the Software without
    restriction, including without limitation the rights to use,
    copy, modify, merge, publish, distribute, sublicense, and/or sell
    copies of the Software, and to permit persons to whom the
    Software is furnished to do so, subject to the following
    conditions:

    The above copyright notice and this permission notice shall be
    included in all copies or substantial portions of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
    OTHER DEALINGS IN THE SOFTWARE.
*/

1. The Twilio/Wigle.net public address dataset 

This public dataset contains the street data for the U.S., based on 
the geodata published by the U.S. Census Bureau's TIGER project.   
We have reformated the data from GIS friendly shapefiles to a more 
generally accessible MySQL database.   

This dataset covers all the streets, roads, and highways in the U.S.  
These streets are represented as "polylines" which are shapes made up
of individual line segments.    Each polyline has its own unique ID, 
and each segment that makes up the polyline has a sequence number.  
The combination of this ID and sequence number is the primary key of 
the address table.   Each row in the database represents one of these 
line segments.

2. Description of Columns 

     +-------------+------------+------+-----+------------+-------+
     | Field       | Type       | Null | Key | Default    | Extra |
     +-------------+------------+------+-----+------------+-------+
     | id          | char(10)   | NO   | PRI |            |       | 
     | seq         | char(3)    | NO   | PRI |            |       | 
     | name        | char(30)   | YES  | MUL | NULL       |       | 
     | prefix      | char(2)    | YES  | MUL | NULL       |       | 
     | type        | char(4)    | YES  |     | NULL       |       | 
     | startlat    | float(12,8)| NO   |     | 0.00000000 |       | 
     | startlong   | float(12,8)| NO   |     | 0.00000000 |       | 
     | endlat      | float(12,8)| NO   |     | 0.00000000 |       | 
     | endlong     | float(12,8)| NO   |     | 0.00000000 |       | 
     | leftzip     | int(5)     | YES  | MUL | NULL       |       | 
     | rightzip    | int(5)     | YES  | MUL | NULL       |       | 
     | leftaddr1   | char(11)   | YES  |     | NULL       |       | 
     | leftaddr2   | char(11)   | YES  |     | NULL       |       | 
     | rightaddr1  | char(11)   | YES  |     | NULL       |       | 
     | rightaddr2  | char(11)   | YES  |     | NULL       |       | 
     | name_dtmf   | char(30)   | YES  |     | NULL       |       | 
     | prefix_dtmf | char(2)    | YES  |     | NULL       |       | 
     +-------------+------------+------+-----+------------+-------+

   id - this is the unique ID of a street, road, or highway polyline,
     according to US census blocks

   seq - this is the sequence number of this segment in the polyline,
     used to correlate 2 segments of the same street within a US
     census block

   name - the name of this street, road, or highway.  example: Main

   prefix - the prefix of the street name.  examples: N,S,E,W

   type - the suffix of the street name.  examples: Blvd, St, Rd 

   startlat/startlong - the lat/long pair of this segments the starting 
point

   endlat/endlong - the lat/long pair of this segments the ending point

   leftzip/rightzip - the zipcode of the addresses on the corresponding
      side of the street

   leftaddr1/leftaddr2/rightaddr1/rightaddr2 - the starting and ending 
addresses
      numbers for each side of the street

   name_dtmf/prefix_dtmf/type_dtmf - the name, prefix and type columns, 
represented  
      as DTMF encoded numbers.  For example, (A,B,C) = 2, (D,E,F) = 3, 
etc.
      Useful for telephony applications. 


3. Example Segment 

    --------------------------------------------------------
             id: 111710515
            seq: 0
           name: Main
         prefix: 
           type: St
       startlat: 41.49493408
      startlong: -87.70324707
         endlat: 41.49483490
        endlong: -87.70324707
        leftzip: 60466
       rightzip: 60443
      leftaddr1: 21801
      leftaddr2: 21805
     rightaddr1: 21800
     rightaddr2: 21804
      name_dtmf: 6246
    prefix_dtmf: 
    --------------------------------------------------------

      Name: Main St.

        Left Zip: 60466
      21801                                21805
        X------------------------------------>
      21800                                21804
        Right Zip: 60443




4. Common Queries

   a. Find the street segment record for 30 Rockefeller Plaza, NY, NY, 
10020 

   SELECT * FROM address WHERE leftzip=10020 AND name LIKE 'Rockefeller' 
AND 30 BETWEEN leftaddr1 AND leftaddr2  
   UNION 
   SELECT * FROM address WHERE rightzip=10020 AND name LIKE 
'Rockefeller' AND 30 BETWEEN rightaddr1 AND rightaddr2

    --------------------------------------------------------
             id: 59657155
            seq: 0
           name: Rockefeller
         prefix: 
           type: Plz
       startlat: 40.7585
      startlong: -73.979
         endlat: 40.7592
        endlong: -73.9786
        leftzip: 10020
       rightzip: 10020
      leftaddr1: 22
      leftaddr2: 38
     rightaddr1: 21
     rightaddr2: 39
      name_dtmf: 7625335537
    prefix_dtmf:  
    --------------------------------------------------------


   b.  Find all the street names within a zipcode. 

   SELECT DISTINCT prefix, name, type FROM address WHERE leftzip=10009 
OR rightzip = 10009

     +--------+----------+------+
     | prefix | name     | type |
     +--------+----------+------+
     |        | 1st      | Ave  | 
     |        | Avenue A |      | 
     | E      | 13th     | St   | 
     | E      | 14th     | St   | 
     | E      | 12th     | St   | 
     | E      | 11th     | St   | 
     | E      | 10th     | St   | 
     | E      | 2nd      | St   | 
     | E      | 3rd      | St   | 
     | E      | 4th      | St   | 
     | E      | 6th      | St   | 
     | E      | 7th      | St   | 
     |        | St Marks | Pl   | 
     | E      | 9th      | St   | 
     |        | Avenue B |      | 
     | E      | 5th      | St   | 
     |        | Avenue C |      | 
     | E      | 20th     | St   | 
     | E      | 15th     | St   | 
     | E      | 16th     | St   | 
     | E      | 8th      | St   | 
     |        | Avenue D |      | 
     |        | Szold    | Pl   | 
     +--------+----------+------+

   c.  Streets starting with F in a zipcode 

   SELECT DISTINCT prefix, name, type from address WHERE leftzip = 94117 
AND name LIKE 'F%'
   UNION
   SELECT DISTINCT prefix, name, type from address WHERE rightzip = 
94117 AND name LIKE 'F%'

     +--------+------------+------+
     | prefix | name       | type |
     +--------+------------+------+
     |        | Fulton     | St   | 
     |        | Fell       | St   | 
     |        | Frederick  | St   | 
     |        | Farnsworth | Ln   | 
     |        | Fillmore   | St   | 
     |        | Friendship | Ct   | 
     +--------+------------+------+

   d. Streets starting with "FRE" as DTMF digits in a zipcode: 

   SELECT DISTINCT prefix, name, type from address WHERE leftzip = 94117 
AND name_dtmf LIKE '373%'
   UNION  
   SELECT DISTINCT prefix, name, type from address WHERE rightzip = 
94117 AND name_dtmf LIKE '373%'

     +--------+------------+------+
     | prefix | name       | type |
     +--------+------------+------+
     |        | Fulton     | St   | 
     |        | Fell       | St   | 
     |        | Frederick  | St   | 
     |        | Farnsworth | Ln   | 
     |        | Fillmore   | St   | 
     |        | Friendship | Ct   | 
     +--------+------------+------+

1 个答案:

答案 0 :(得分:0)

根文件夹public class JSONParser { String charset = "UTF-8"; HttpURLConnection conn; DataOutputStream wr; StringBuilder result; URL urlObj; JSONObject jObj = null; StringBuilder sbParams; String paramsString; public JSONObject makeHttpRequest(String url, String method, HashMap<String, String> params) { sbParams = new StringBuilder(); int i = 0; for (String key : params.keySet()) { try { if (i != 0){ sbParams.append("&"); } sbParams.append(key).append("=") .append(URLEncoder.encode(params.get(key), charset)); } catch (UnsupportedEncodingException e) { e.printStackTrace(); } i++; } if (method.equals("POST")) { // request method is POST try { urlObj = new URL(url); conn = (HttpURLConnection) urlObj.openConnection(); conn.setDoOutput(true); conn.setRequestMethod("POST"); conn.setRequestProperty("Accept-Charset", charset); conn.setReadTimeout(10000); conn.setConnectTimeout(15000); conn.connect(); paramsString = sbParams.toString(); wr = new DataOutputStream(conn.getOutputStream()); wr.writeBytes(paramsString); wr.flush(); wr.close(); } catch (IOException e) { e.printStackTrace(); } } else if(method.equals("GET")){ // request method is GET if (sbParams.length() != 0) { url += "?" + sbParams.toString(); } try { urlObj = new URL(url); conn = (HttpURLConnection) urlObj.openConnection(); conn.setDoOutput(false); conn.setRequestMethod("GET"); conn.setRequestProperty("Accept-Charset", charset); conn.setConnectTimeout(15000); conn.connect(); } catch (IOException e) { e.printStackTrace(); } } try { //Receive the response from the server InputStream in = new BufferedInputStream(conn.getInputStream()); BufferedReader reader = new BufferedReader(new InputStreamReader(in)); result = new StringBuilder(); String line; while ((line = reader.readLine()) != null) { result.append(line); } Log.d("JSON Parser", "result: " + result.toString()); } catch (IOException e) { e.printStackTrace(); } conn.disconnect(); // try parse the string to a JSON object try { jObj = new JSONObject(result.toString()); } catch (JSONException e) { Log.e("JSON Parser", "Error parsing data " + e.toString()); } // return JSON Object return jObj; } } 下的每个子文件夹都代表一个数据库。因此,如果我想使用数据库/data-us-street,我需要使用文件夹addresses

  1. 将目标数据库文件夹复制到当前MySQL数据库服务器实例的数据目录中,例如的/ var / lib中/ MySQL的

    ./addresses
  2. 递归更改文件夹所有者

    $ cp -r /data-us-street/addresses /var/lib/mysql/addresses
    
  3. 使用数据库$ chown mysql:mysql /var/lib/mysql/addresses/* 并享受!