用mongo查询python

时间:2016-07-26 15:54:09

标签: python mongodb

在我的python代码中,我查询了mongodb。但是,数据库的名称包含连字符" - "。我该如何访问它?

from pymongo import MongoClient
import sys

client = MongoClient()

db = client.customer-care

cursor = db.interactions.find()

for document in cursor:
    sys.stdout=open("test.txt","w")
    print(document)
    sys.stdout.close()

但是这段代码给了我以下错误:

Traceback (most recent call last):
  File "test.py", line 6, in <module>
    db = client.customer-care
NameError: name 'care' is not defined

4 个答案:

答案 0 :(得分:2)

Python identifiers不能包含短划线字符。

如果你必须使用带有破折号的数据库名称(我建议你考虑驼峰式),请使用以下语法:

db = client['customer-care']

答案 1 :(得分:1)

以下是使用MongoDB的示例:

try:

    from pymongo import MongoClient

    conn = MongoClient("some host", port)

    db = conn.get_database('rest-mongo-practice')

    db.authenticate('data', 'data123')

    brooklyn_rests = db.get_collection('restaurants').find({"borough": "Brooklyn"}).limit(10)

    for rest in brooklyn_rests:

        print(rest)

    conn.close()

except IOError:

    print("Error: cannot connect to rest-mongo-practice")

答案 2 :(得分:0)

    MongoDB Queries
Structure:
db.collection.find(query, projection)
● MongoDB is queried using Javascript.
● The find() method is equivalent to the sql SELECT
● The find() method receives multiple arguments,
comparable to sql SELECT and WHERE clauses

Simple find() examples:
● db.restaurants.find()
● db.restaurants.find({"restaurant_id" : "30075445" })
● db.restaurants.find({"_id":
ObjectId("593fdedd378594b41f771794")})
● db.restaurants.find({"cuisine": {$in: ["Bakery","Chinese"]}})

MongoDB provides a few dozens of operators. Some query
operators examples:
● Comparison Operators $eq, $gt, $gte, $lt, $lte, $ne, $in, $nin
● Logical Operators $and, $or, $not, $nor
● Element Operators $type, $exists
● Evaluation Operators $where, $text

Find the restaurant named “Mrs. Maxwell'S Bakery”:
db.restaurants.find({‘name’ : “Mrs. Maxwell'S Bakery”})
Find all the students that pass the test:
db.students.find( {"grade": { $gt : 60 }})

Find all the bakeries in Brooklyn:
db.restaurants.find( {$and: [
 {"cuisine" : "Bakery"} ,
 {"borough" : "Brooklyn"}
 ]
})

Example: Find all restaurants in zip code 10462:
db.restaurants.find({"address.zipcode" : "10462"})

Example: Find all restaurants that got a C grade:
db.restaurants.find({"grades.grade" : "C"})
* Note that ‘grades’ is an array

The Projection parameter is used to specify the fields in the
documents of the result set. (Equivalent to sql SELECT clause)
● To specify fields, set their value to 1 in the projection
object.
● If a projection object is not specified, all fields are
returned.
● The _id field is returned by default unless set to 0
db.collection.find(query, projection)

Example: Find all the names of the restaurants in zip code
10462
db.restaurants.find({"address.zipcode" : "10462"}, {"name" : 1})
Example: Find the names of all restaurants
db.restaurants.find({"address.zipcode" : "10462"}, {"_id" : 0, "name" : 1})

In MongoDB, the returned value of a query is a database Cursor.
● A cursor is a pointer to the result set, stored in the database’s memory.
● The cursor object provides methods for handling the result set.
● MongoDB are iterable.
● Some cursor functions
○ count(), limit(), sort(), max(), min(), next(), hasNext(), skip(), toArray(),
forEach(function)

Example: limit the result of the previous query
db.restaurants.find({"address.zipcode":"10462"},{"_id":0,"name":1}).limit(3)

The sort() cursor function accepts an object that specifies the sort fields and
orientation.
Example: Sort the result of the previous query by name
db.restaurants.find({"address.zipcode":"10462"},{"_id":0,"name":1}).sort({"name":1})

The sort() cursor function accepts an object that specifies the sort fields and
orientation.
Example: Sort the result of the previous query by name in descending order
db.restaurants.find({"address.zipcode":"10462"},{"_id":0,"name":1}).sort({"name":-1})

Example: Sort all restaurants by zip code and name
db.restaurants.find({},{"_id":0,"name":1,"address.zipcode": 1})
.sort({"address.zipcode": 1, "name": 1 })

The count() cursor function returns the number of records in the result set.
Example: How many restaurants are in the restaurant's collection?
db.restaurants.find().count()

Example: How many restaurants are in the brooklyn?
db.restaurants.find({"borough" : "Brooklyn"}).count()

Reminder I: MongoDB shell is based on Javascript.
Reminder II: The cursor object is iterable.
This means that we can do:
var cursor = db.restaurants.find(); // pointer to the result set
while(cursor.hasNext()){ // Iterate over the result set
 var document = cursor.next(); // Treat the documents as JS objects
 var restName = document['name'];
 print(restName);
}

The MongoDB cursor object also supports the javascript forEach function.
● In javascript, forEach is used to iterate over an array and apply a function
on each element.
● Javascript Example:
var arr = [1, 2, 3, 4]; //just an array of numbers
var printDouble = function(num){ // save the function as a variable
print(num * 2);
 };
arr.forEach(printDouble); // apply printDouble for each element in arr

MongoDB shell Example:
db.restaurants.find().forEach(function(document){
print(document.name + "," +
 document.cuisine + "," +
 document.borough);
 });

MongoDB has 2 approaches to aggregations
● Aggregation Pipeline - using the built in aggregate() function (and others)
● MapReduce - defining two functions, map() and reduce(), usually to handle
larger data sets. 

Syntax:
db.myCollection.aggregate([{<stage1>}, {<stage2>}...])
Example:
db.orders.aggregate([
{$match : {“status: “A”} },
 {$group : {_id: “$cust_id, total: {$sum : “$amount”}
 ])

$match - Filters documents by specified condition(s) to be passed to the next pipeline stage.
● $group - Groups documents by some expression and output one document per group.
● $limit - Limits the number of documents passed to the next stage in the pipeline.
● $skip - Skips over N documents before the passing the documents to the next pipeline stage.
● $sort - Sorts all input documents and returns them to the pipeline in sorted order.
● $unwind - Deconstructs an array field from the documents. Outputs a document per element.
● $count - Output document contains a count of the number of documents input to the stage.
● $project - Passes the documents with specified, or computed fields to the next pipeline
stage.


db.articles.aggregate( [ {$match: { author : "dave" }} ] );

db.collection.aggregate( [ { $group: { _id: <expression>,
 <field1>: { <accumulator1> : <expression1> }, ... }
}] );
● The _id field is mandatory. If _id is null, all documents will be aggregated
into a single output document.
● The <accumulator> operator can perform various operators such as $sum,
$avg, $first, $last, $max, $min, $push, $addToSet.

db.sales.aggregate( [ { $group : { _id : "$item" } } ] );

db.sales.aggregate( [{$group: { _id: "$item" , total_sold: {$sum: "$quantity" }}}]);

db.sales.aggregate([{$match: {"date": {$gt: new Date("2014-04-01")}}},
 {$group: { _id: "$item", total_sold: {$sum: "$quantity"}}}]);

db.sales.aggregate([{ $sort : { price : -1}},
 { $limit : 3 },
 { $skip : 1 }]);

MongoDB also provides the count() and distinct() functions.
db.collection.count()
db.collection.distinct(field)
db.orders.distinct(“cust_id”)

db.orders.count({<query>},{projection}) is equivalent to
db.orders.find({<query>},{projection}).count()

MongoDB handles CRUD operations on documents using insert(), find(),
update() and remove().
* Collections can be implicitly created by inserting a document.

CRUD Insert
Example:
db.mycol.insert({
 _id: ObjectId(7df78ad8902c),
 title: 'MongoDB Overview',
 description: 'MongoDB is no sql database',
 by: 'tutorials point',
 url: 'http://www.tutorialspoint.com',
 tags: ['mongodb', 'database', 'NoSQL'],
 likes: 100
})
* if an _id is not set, MongoDB will generate an ObjectId. 

CRUD Update
Syntax:
db.collection.update(<query>, <update>, <options)
Some Update operators:
● $set / $unset Sets/Removes the value of a field in a document.
● $inc / $mul Increments/Multiplies the value of the field by the specified amount.
● $pull / $push Adds/Removes an item to an array.

db.tutorials.update({'title':'MongoDB Overview'},
 {$set:{'title':'New MongoDB Tutorial'}})

db.sales.update({'_id': 1},
 {$inc:{'price': 5, 'quantity' : 1 }})

Update options:
db.collection.update(
 <query>,
 <update>,
 {
 upsert: <boolean>, // create a document if non found
 multi: <boolean>, // update all documents that match the query
 writeConcern: <document>, // set the level of write concern
 collation: <document> //specify language-specific rules for string comparison
 }
)

Example: db.sales.update({'item': 'abc'}, {$inc:{'price': 5}, {multi: true} })

CRUD remove
Syntax:
db.collection.remove({<query>})
● Remove is will delete by default all the documents that match the query.
● If a query object is not specified, all the documents will be deleted.
(e.g. SQL TRUNCATE) 

var result = db.users.findOne({"name":"Tom Benzamin"},{"address_ids": 1})
var addresses = db.address.find({"_id":{"$in": result["address_ids"]}})

答案 3 :(得分:-2)

     """1) read all the comments from the api and write them into a temporary file. 
       Read that file's content and concert it into a list of dictionaries.
       insert those dictionaries into a new mongodb collection."""

    import requests
    from pymongo import MongoClient

    r = requests.get("http://jsonplaceholder.typicode.com/posts")
    comments = r.json()

    client = MongoClient("ec2-52-91-68-76.compute-1.amazonaws.com:27017")
    db = client['elevation']
    for comment in comments:
     db.my_comments.insert(comment)
    client.close()

2) Read all Invoices from the chinook database. Pivot the records by creating a dictionary, 
       with Countries as keys and a list of invoices as their values
       Print a list of countries and their amount of invoices they have. 

    import sqlite3
    conn = sqlite3.connect("Chinook_Sqlite.sqlite")
    c = conn.cursor()
    c.execute("""select BillingCountry from invoice""")
    all_rows = c.fetchall()

    country_dict = {}
    for row_tuple in all_rows:
        country = row_tuple[0]
        if country_dict.get(country) == None:
             country_dict[country] = 0

         country_dict[country] += 1

     for key, val in country_dict.items():
         print key, " - ", val

3) Find all the Donats restaurants from Manhattan in the MongoDB colletions. Print their names, restaurant_id and address (street and number) into a CSV file named donuts_in_manhatten.csv

     from pymongo import MongoClient
     client = MongoClient("___")
     db = client['______']
cursor = db.restaurants.find({'cuisine': 'Donuts', 'borough':'Manhattan'})
     rests = []
     for rest in cursor:
         rests.append(rest)
     client.close()

     csv_str = "name, restaurant_id, street, number\n"
     for rest in rests:
         csv_str += rest['name'] + ',' + rest['restaurant_id'] + ',' + rest['address']['street'] + ',' + rest['address']['building'] + "\n"

     hello_file = open("donuts_in_manhatten.csv", 'w+')
     hello_file.write(csv_str)
     hello_file.close()

print the second element of ['a', 1, "foo", ['a', 'b', 3]]
    my_list = ['a', 1, "foo", ['a', 'b', 3]]
    print my_list[2]

print the second element of the fourth element of ['a', 1, "foo", ['a', 'b', 3]]
     print my_list[3][1]
print the sum of all the numbers under 1000 that can be evenly divided by 13     
 my_sum = 0
 for i in range(1000):
     if i % 13 == 0:
         my_sum += i

   print my_sum 
 print the second highest number of [1, 4, 3, 21, 12, 8]

     my_list = [1, 4, 3, 21, 12, 8]
     my_list.sort()
     print my_list[-2]
     my_list.reverse()
     print my_list[1]
print the age of people who were born in the following years [1984, 1982, 1990, 1985, 1992, 1991] """
    years_of_birth = [1984, 1982, 1990, 1985, 1992, 1991]
    ages = []
    for year in years_of_birth:
        ages.append(2017 - year)

     print ages
write an infinite loop that adds elements into a list and see what happens :)
  my_list = []
  i = 0
  while True:
      my_list.append(i)
      i += 1

Define two functions named 'foo' and 'bar'. make 'foo' call the 'bar' function and make 'bar' call 'foo'. Call the 'foo' function and see what happens
def foo():
      bar()
def bar():
     foo()
foo()  
now wrap that code with error handling and print a custom error message
def foo():
   bar()
def bar():
    foo()
try:
     foo()
except:
   print "caught a stack overflow error"
print "continuing flow"
convert the following numeric grades into letters and print them for each student. A for > 90, B for 80-90, C for 
    70-80, D for 60-70 and F for < 60. The grades are Jhon = 87, Alice = 98, Bob = 55, Charlie = 45, Dave = 72
    # my_dict = {'Jhon':  87, 'Alice': 98, 'Bob': 65, 'Charlie': 45, 'Dave': 72}
    #
    # for key, val in my_dict.items():
    #     if val >= 90:
    #         print key + ": A"
    #     elif val >= 80:
    #         print key + ": B"
    #     elif val >= 70:
    #         print key + ": C"
    #     elif val >= 60:
    #         print key + ": D"
    #     else:
    #         print key + ": F"

read the file named rest.json and print the borough
    # from json import loads
    # rest_file = open("rest.json", 'r')
    # rest_file_content = rest_file.read()
    # rest_file_json = loads(rest_file_content)
    # print rest_file_json['borough']
    # rest_file.close()

Write 'hello file! into a file named hello.txt
    # hello_file = open("hello.txt", 'w+')
    # hello_file.write("hello file!")
    # hello_file.close()

send a GET http request to http://jsonplaceholder.typicode.com/posts and print its content:
    # import requests
    # r = requests.get("http://jsonplaceholder.typicode.com/posts")
    # print r.text

send the same request and convert the content into a json. print the id of the last object in the response:
    # import requests
    # r = requests.get("http://jsonplaceholder.typicode.com/posts")
    # response = r.json()
    # response_as_list = list(response)
    # print response_as_list[-1]['id']


 connect to the chinook sqlite db. List all genres and their track count:
    # import sqlite3
    # conn = sqlite3.connect("Chinook_Sqlite.sqlite")
    # c = conn.cursor()
    # c.execute("""select genre.genreId, count(track.trackId)
    #              from track, genre
    #              where track.genreId = genre.genreId
    #              group by genre.genreId""")
    # all_rows = c.fetchall()
    # for row in all_rows:
    #     genre_id = row[0]
    #     track_count = row[1]
    #     print genre_id, "-", track_count

    #### with a dictionary example ####
    #
    # import sqlite3
    # def dict_factory(cursor, row):
    #     d = {}
    #     for idx, col in enumerate(cursor.description):
    #         d[col[0]] = row[idx]
    #     return d
    #
    # conn = sqlite3.connect("Chinook_Sqlite.sqlite")
    # conn.row_factory = dict_factory
    # c = conn.cursor()
    # c.execute("""select genre.genreId, count(track.trackId) as total
    #              from track, genre
    #              where track.genreId = genre.genreId
    #              group by genre.genreId""")
    # all_rows = c.fetchall()
    # all_rows = all_rows
    # for row_dict in all_rows:
    #     print 'GenreId - ', row_dict['GenreId'], " total: ", row_dict['total']

    **/*Find queries** 
    // find all restaurants in the Brooklyn borough
    db.restaurants.find( {'borough' : 'Brooklyn'})

    // find the restaurant with a restaurant_id of "40357217"
    db.restaurants.find( {restaurant_id : "40357217"})

    // find all Donuts shops in the Brooklyn borough
    db.restaurants.find( {$and : [ {'cuisine' : 'Donuts'}, {'borough' : 'Brooklyn'}]})

    // find all restaurants on 11 Avenue
    db.restaurants.find( {'address.street' : '11 Avenue'})

    // find all restaurants with a restaurant_id greater than "50000000"
    db.restaurants.find( {restaurant_id : { $gt : "50000000" }})

    // find all restaurants in zipcodes smaller than "10500"
    db.getCollection('restaurants').find({ 'address.zipcode' : {$lt : "10500"}})

    // find all restaurant in Manhatten from the following cuisines: Italian, American, Seafood, Continental, Bakery.  
    db.restaurants.find( {borough : "Manhattan", cuisine : {$in : ['Italian', 'American', 'Seafood', 'Continental', 'Bakery']}} )

    *   Simple Aggregations 

    // Count all restaurants in Manhattan 
    db.restaurants.count( {borough : "Manhattan"} ) / db.restaurants.find( {borough : "Manhattan"} ).count()

    // find the first restaurant on West 57 Street
    db.restaurants.find( {"address.street" : "West 57 Street"} ).sort({"address.building" : 1}).limit(1)


    // find a distinct list of cuisines
    db.restaurants.distinct("cuisine")


    /*
    *
    *   Aggregation Pipeline 
    *
    */


    // match all restaurants in Queens 
    db.restaurants.aggregate({$match : {borough : 'Queens'}})

    // group and count all restaurants in Queens by cuisine 
    db.restaurants.aggregate({$match : {borough : 'Queens'}}, {$group : {_id : "$cuisine", total : { $sum : 1} }})


    // group all restaurants by zipcodes and it's list of cuisines
    // result document example: 
    // {
    //     "_id" : "10123",
    //     "cuisines" : [ 
    //         "Juice, Smoothies, Fruit Salads"
    //     ]
    // }

    db.restaurants.aggregate({$group : {_id : "$address.zipcode",  cuisines : { $addToSet : "$cuisine"} }})



    // find the top 10 popular cuisines in zipcode "11369" 
    db.restaurants.aggregate({$match : {"address.zipcode" : "11369"}}, 
                             {$group : {_id : "$cuisine", total : { $sum : 1} }},
                             {$sort : {total : -1}}, 
                             {$limit : 10} )



    // list all grades and a list of restaurants in Manhatten that got that grade. (hint: use the $unwind stage) 
    db.restaurants.aggregate( {$match : {borough : "Manhattan"}}, 
                              {$unwind : '$grades'}, 
                              {$group : {_id : "$grades.grade",  cuisines : { $addToSet : "$cuisine"} }})






    /*
    *
    *   Crud operations 
    *
    */


    // delete the restaurant with the restaurant_id of "40363630"
    db.restaurants.remove({restaurant_id : "40363630"})


    // update the name of "Wild Asia" restaurant to be "Wild Wild Asia"
    db.restaurants.update({"name" : "Wild Asia"}, {$set : {"name" : "Wild Wild Asia"}})


    // set the cuisine of *all* the restaurants in Brooklyn to be "Brooklyn Style"
    db.restaurants.update({borough : "Brooklyn"}, {$set : {cuisine: "Brooklyn Style"} }, {multi : true})

    -- add a new artist named 'mashina', and their album 'Mashina 2' and their track called  'send me an angel' ( gegnre is Rock, media type MPEG audio file, the song is 4:10 minutes long and it is free, rest of the fields are null)

    insert into artist values('276', 'Mashina');
    insert into album values('348', 'Mahina 2', '276');
    insert into track values('3504', 'send me an angel', 1,1,1,null,(4* 60 + 10) * 1000, null, 0);
    select * from track where trackId = 3504;

    -- fire all the it staff
    delete from employee where title = 'IT Staff';

    -- delete all tracks and albums by Green Day (if you fail, explain why)
    delete 
    from track 
    where albumId in (select albumId 
                                    from album 
                                    where artistId in (select artistId 
                                                                  from artist 
                                                                  where name = 'Green Day'));

    delete
    from album 
    where artistId in (select artistId 
                                  from artist 
                                  where name = 'Green Day');                                                              



    -- double the quantity of in orderslines of rock tracks. 
    update invoiceLine 
    set quantity = quantity * 2
    where trackId in (select trackId
                                  from track, genre
                                  where track.genreId = genre.genreId
                                  and genre.Name = 'Rock')

    -- set all customer's company column to be 'Acme' if it is not set. 

    update customer 
    set company = 'Acme'
    where company is null

    -- a) create a temporary table of top 10 popular genres (genreId, genreName, total_tracks) 
    create temporary table top_genres as select genre.genreId, genre.name, count(*) as total_tracks
    from track, genre
    where track.genreId = genre.genreId
    group by genre.genreId, genre.name
    order by total_tracks desc 
    limit 10;

    select * from top_genres;

    -- b) use that table to query 100 top sold tracks from the top 10 popular genres. 

    select track.trackId, track.name, sum(invoiceLine.quantity) as total_sold
    from invoiceLine, track
    where invoiceLine.trackId = track.trackId
    and track.genreId in (select genreId from top_genres)
    group by track.trackId
    order by total_sold desc
    limit 100

    -- create a view that represents the eployees sales by genre (employeeId, genre, sold)
    create view employee_genre_sales as 
    select employee.employeeId, genre.genreId, sum(invoiceLine.quantity) as sold
    from employee, customer, invoice, invoiceLine, track, genre
    where employee.employeeId = customer.supportRepId 
    and invoice.customerId = customer.customerId
    and invoice.invoiceId =  invoiceLine.invoiceId 
    and invoiceLine.trackId = track.trackId
    and track.genreId = genre.genreId
    group by employee.employeeId, genre.genreId;

    -- use that view to find the how many items did each employee sell from their favorite genre? 

    select employeeId, max(sold) as max_sale
    from employee_genre_sales
    group by employeeId;

    -- What is the median time for a user to post since joining the app in 2015

    -- step 1: find the first post time of each user
    create temp table first_post as 
    select user_id, min(create_date) as first
    from posts
    group by user_id;

    -- step 2: calc the time diff between user creation and first post (not sure whats julianday? google it :) ) 
    create temp table time_to_first as
    select users.id, (julianday(first_post.first) - julianday(users.create_date))  as days_to_first_post
    from users, first_post
    where users.id = first_post.user_id
    and strftime('%Y' , users.create_date) = "2015"
    order by days_to_first_post;

    -- step 3: calc the median time in days

    select round(days_to_first_post)
    from time_to_first
    order by days_to_first_post 
    limit 1 
    offset (select count(*) / 2 from time_to_first)

    /*/
    -- What was the DAU/MAU of visitors in May 2016?

    -- step 1: calcualte the MAU
    create temp table MAU as 
    select count(distinct(user_id)) 
    from user_visits 
    where time between datetime('2016-05-01') and datetime('2016-05-31')

    -- step 2: calcuate the DAU
    create temp table dau_mau as 
    select strftime('%d', time) as day, count(distinct(user_id))   * 1.0 / (select * from MAU) as DAU_MAU
    from user_visits
    where time between datetime('2016-05-01') and datetime('2016-05-31')
    group by day


    -- step 3: present value as percentage 
    select day, round(DAU_MAU, 2)  * 100   as dau_mau 
    from dau_mau
 --

    -- What was the median life span of users that were not seen in the last 6 months and joined in 2015? 

    -- step 1: users that joined in 2015 and not visited in 6 months
    drop table churned_2015;
    create temp table churned_2015 as 
    select *
    from users, user_visits
    where users.id = user_visits.user_id
    and strftime('%Y' , create_date) = "2015"
    and users.id not in (select distinct user_id from user_visits where time > date('now', '-6 month'));

    -- step 2: what was the lifetime of each user in that segment
    create temp table churned_2015_lifetime as
    select users.id, (strftime('%s', last_seen)  - strftime('%s', create_date)) / (60 * 60 * 24) as lifetime
    from users, (select user_id , max(time) as last_seen 
                          from churned_2015 
                          group by user_id) as churned_last_seen
    where users.id = churned_last_seen.user_id
    order by lifetime;

    -- step 3: the median lifetime (skip half the records)
    select lifetime as median_lifetime
    from churned_2015_lifetime
    order by lifetime 
    limit 1
    offset (select count(*) /2 from churned_2015_lifetime);

   /*/
    -- What is the likelihood for a user to post a second, third, forth and fifth post?

    -- step 1: users post count
    create temp table  user_posts_cnt as
    select user_id, count(*) total_user_posts
    from posts
    group by user_id

    -- 
    -- step 2: how many users posted more than X times

    create temp table users_posts_stopping_point as
    select 
    (select count(distinct(user_id)) from posts) as total_users_posted,
    (select count(*) from user_posts_cnt where total_user_posts > 1) as posted_more_than_once,
    (select count(*) from user_posts_cnt where total_user_posts > 2) as posted_more_than_twice,
    (select count(*) from user_posts_cnt where total_user_posts > 3) as posted_more_than_thrice,
    (select count(*) from user_posts_cnt where total_user_posts > 4) as posted_more_than_quad;


    -- step 3: calculate the conversion ratio beyween each next post
    select 
    posted_more_than_once * 1.0 / total_users_posted as second_post_likelihood,
    posted_more_than_twice * 1.0 / total_users_posted as third_post_likelihood,
    posted_more_than_thrice * 1.0 / total_users_posted as forth_post_likelihood,
    posted_more_than_quad * 1.0 / total_users_posted as fifth_post_likelihood
    from users_posts_stopping_point

    use soccer;

    -- List all players and their properties
    SELECT *
    FROM Player;

    -- List all Players names and birthdays
    SELECT player_name, birthday
    FROM Player;

    -- list all team names and their short names

    select team_long_name, team_short_name 
    from team;

    -- list all players that are over 180 cm in height. 

    SELECT * 
    FROM Player 
    WHERE height > 180;

    -- how many players are above 180cm in height? 
    SELECT count(*)
    FROM Player 
    WHERE height > 180;

    -- what is the average height of players?
    SELECT AVG(height)
    FROM Player;

    -- what is the average height of players that are under 25 years old?
    SELECT AVG(weight) as avg_weight
    FROM Player
    where cast(strftime(birthday) as integer) < strftime('%Y',date('now')) - 25;

    -- what is the average BMI of players? 
    select AVG(weight * 0.453 / ((height / 100)*(height / 100))) as BMI
    from Player;

    -- list the youngest 10 players 
    SELECT player_name, birthday
    FROM Player
    ORDER BY birthday DESC
    LIMIT 10;

    -- list all the games
    Select * from match;

    -- list all the distinct game seasons 
    select distinct(season)
    from match;

    -- how many games ended with a goal gap of over 3? 
    select count(*)
    from match 
    where ABS(home_team_goal - away_team_goal) > 3;

    -- what is the maximun number of times two groups played against each other? 
    select home_team_api_id, away_team_api_id,count(*)
    from match 
    group by home_team_api_id, away_team_api_id;


    -- how many games ended in a draw?
    select count(*) 
    from match 
    where home_team_goal = away_team_goal;

    -- how many games had goals but ended in a draw?
    select count(*) 
    from match 
    where home_team_goal = away_team_goal
    and home_team_goal != 0;


    -- what is the ratio between home team goals and away team goals 
    select sum(home_team_goal) / cast(sum(away_team_goal) as double)
    from match ;

    -- what month hosts the most games
    select strftime('%m',date) as game_month,date, count(*) 
    from match 
    group by strftime('%m',date)
    order by game_month desc
    limit 1;


    -- which leage had the most games in 2008
    select league_id,count(*) as total
    from match
    where strftime('%Y',date) = '2008'
    group by league_id
    order by total desc
    limit 1;


    -- which season had the most goals
    select season, sum(home_team_goal) + sum(away_team_goal) as total_goals
    from match
    group by season
    order by total_goals desc
    limit 1;


    -- which season had the most goals per game? 
    select season, (sum(home_team_goal) + sum(away_team_goal) ) / count(*) as total_goals_per_game
    from match
    group by season
    order by total_goals_per_game desc
    limit 1;

    -- what was the date of the first gmae in each season?
    select season, min(date) 
    from match
    group by season;

    -- list all players names and categorize their height as 'short' (under 170) 'medium' (between 170 and 179) and 'high' (over 180)

    select player_name, height,
              case when height > 180 then 'high'
                       when height between 179 and 170 then 'medium'
                       else 'short' end as height_category
    from player;


    -- what is the avg number of goals in a game that didn't end in a draw?
    select AVG(home_team_goal + away_team_goal)
    from match
    where home_team_goal != away_team_goal;


    -- list all seasons with an avg goal per game rate of over 1, in games that didnt end with a draw
    select season, ROUND((SUM(home_team_goal+away_team_goal) *1.0) /Count(id),3) as ratio
    from match
    where home_team_goal != away_team_goal
    group by season 
    having  ratio > 1