MongoDB - First try

My first experience with MongoDB. I will install MongoDB and Pymongo, insert some data and query it. Next step will be to tryout monary, but for this notebook it is out of scope.

Installation steps

Run the following commands to install MongoDB.

jitsejan@vps:/$ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 0C49F3730359A14518585931BC711F9BA15703C6
jitsejan@jjvps:/$ echo "deb [ arch=amd64,arm64 ] http://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.4 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.4.list
jitsejan@jjvps:/$ sudo apt-get update
jitsejan@jjvps:/$ sudo apt-get install -y mongodb-org

Start MongoDB

jitsejan@jjvps:/$ sudo service mongod start
In [1]:
!tail /var/log/mongodb/mongod.log
2017-04-05T10:33:15.620-0400 W FTDC     [initandlisten] Error checking directory '/sys/block': No such file or directory
2017-04-05T10:33:15.621-0400 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/var/lib/mongodb/diagnostic.data'
2017-04-05T10:33:15.621-0400 I NETWORK  [thread1] waiting for connections on port 27017
2017-04-05T10:34:27.470-0400 I NETWORK  [conn4] received client metadata from 127.0.0.1:34370 conn4: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "debian stretch/sid", architecture: "x86_64", version: "2.6.32-042stab120.16" }, platform: "CPython 3.6.0.final.0" }
2017-04-05T10:40:10.716-0400 I COMMAND  [conn4] dropDatabase nintendo_db starting
2017-04-05T10:40:10.723-0400 I COMMAND  [conn4] dropDatabase nintendo_db finished
2017-04-05T10:40:26.823-0400 I NETWORK  [conn5] received client metadata from 127.0.0.1:42910 conn5: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "debian stretch/sid", architecture: "x86_64", version: "2.6.32-042stab120.16" }, platform: "CPython 3.6.0.final.0" }
2017-04-05T10:40:26.831-0400 I NETWORK  [conn6] received client metadata from 127.0.0.1:42912 conn6: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "debian stretch/sid", architecture: "x86_64", version: "2.6.32-042stab120.16" }, platform: "CPython 3.6.0.final.0" }
2017-04-05T10:53:00.408-0400 I NETWORK  [conn7] received client metadata from 127.0.0.1:35700 conn7: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "debian stretch/sid", architecture: "x86_64", version: "2.6.32-042stab120.16" }, platform: "CPython 3.6.0.final.0" }
2017-04-05T10:53:00.410-0400 I NETWORK  [conn8] received client metadata from 127.0.0.1:35702 conn8: { driver: { name: "PyMongo", version: "3.4.0" }, os: { type: "Linux", name: "debian stretch/sid", architecture: "x86_64", version: "2.6.32-042stab120.16" }, platform: "CPython 3.6.0.final.0" }

Install Python module

Use Pymongo to communicate with MongoDB.

In [2]:
!pip install pymongo
Requirement already satisfied: pymongo in /home/jitsejan/anaconda3/lib/python3.6/site-packages

Connect to MongoDB

In [3]:
import pymongo
print(pymongo.version)
client = pymongo.MongoClient('mongodb://localhost:27017/')
3.4.0

Check which databases already exist

In [4]:
client.database_names()
Out[4]:
['admin', 'local', 'nintendo_db']

Create a new database

You can create a database by simply selecting the non-existing database. Only when a document is written, the database will physically be created.

In [5]:
db = client.nintendo_db
db
Out[5]:
Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'nintendo_db')

Create a new collection

In [6]:
characters = db.characters
characters
Out[6]:
Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'nintendo_db'), 'characters')

Create documents

For simplicity, I will use the data that I have used in another notebook for creating documents.

In [7]:
import pandas as pd
character_df = pd.read_csv('../data/nintendo_characters.csv')
character_df
Out[7]:
id name description color occupation picture
0 2 Luigi This is Luigi green plumber https://upload.wikimedia.org/wikipedia/en/f/f1...
1 1 Mario This is Mario red plumber https://upload.wikimedia.org/wikipedia/en/9/99...
2 3 Peach My name is Peach pink princess https://s-media-cache-ak0.pinimg.com/originals...
3 4 Toad I like funghi red NaN https://upload.wikimedia.org/wikipedia/en/d/d1...
In [8]:
import json
characters_dict = character_df.to_dict(orient='records')
print(json.dumps(characters_dict[0], indent=4))
{
    "id": 2,
    "name": "Luigi",
    "description": "This is Luigi",
    "color": "green",
    "occupation": "plumber",
    "picture": "https://upload.wikimedia.org/wikipedia/en/f/f1/LuigiNSMBW.png"
}
In [9]:
for character in characters_dict:
    character_id = characters.insert_one(character).inserted_id
    print(character_id)
58e504df6221ac77482eae4e
58e504df6221ac77482eae4f
58e504df6221ac77482eae50
58e504df6221ac77482eae51

Verify the new collection has been created

In [10]:
db.collection_names(include_system_collections=False)
Out[10]:
['characters']

Verify the characters have been added

Check the number of documents for the characters collection.

In [11]:
characters.count()
Out[11]:
8

Check if Luigi is in the database.

In [12]:
characters.find_one({"name": "Luigi"})
Out[12]:
{'_id': ObjectId('58e501db6221ac72c8a3106b'),
 'color': 'green',
 'description': 'This is Luigi',
 'id': 2,
 'name': 'Luigi',
 'occupation': 'plumber',
 'picture': 'https://upload.wikimedia.org/wikipedia/en/f/f1/LuigiNSMBW.png'}

Retrieve all documents in the characters collection

In [13]:
characters_from_db = list(characters.find({}))
characters_from_db[0]
Out[13]:
{'_id': ObjectId('58e501db6221ac72c8a3106b'),
 'color': 'green',
 'description': 'This is Luigi',
 'id': 2,
 'name': 'Luigi',
 'occupation': 'plumber',
 'picture': 'https://upload.wikimedia.org/wikipedia/en/f/f1/LuigiNSMBW.png'}

Find the red characters

Only retrieve the name and description of the character.

In [14]:
red_characters = list(characters.find({"color": "red"}, {"name":1, "description":1, "_id":0}))
red_characters
Out[14]:
[{'description': 'This is Mario', 'name': 'Mario'},
 {'description': 'I like funghi', 'name': 'Toad'},
 {'description': 'This is Mario', 'name': 'Mario'},
 {'description': 'I like funghi', 'name': 'Toad'}]

Create a dataframe from the results

In [15]:
import pandas as pd
red_characters_df = pd.DataFrame.from_dict(red_characters)
red_characters_df
Out[15]:
description name
0 This is Mario Mario
1 I like funghi Toad
2 This is Mario Mario
3 I like funghi Toad

Drop the database

In [16]:
client.drop_database('nintendo_db')