MongoDB Cheatsheet
MongoDB - Terminology
| Term | Description | 
|---|---|
| MongoDB | Is a general purpose document database | 
| Mongo Atlas | A Cloud Based version of MongoDB with Enterprise Addons | 
| Document | The basic unit of data in MongoDB (Similiar to a Record in SQL Databases) | 
| Colleciton | A grouping of documents (Similiar to a Table in SQL Datasbases) | 
| Database | A container for collections | 
| BJSON | Binary JSON Storage format for MongoDB | 
| _id | Field is a mandatory field in MongoDB, it will be auto gnerated if not provided | 
| Data Modelling | How data is stored and the process of definig relationship between data | 
| Schema | The organisation of data in you database | 
| Unbounded Documents | Documents without size limits | 
Mongo DB Cli Reference Commands
| Command | Description | 
|---|---|
| db.help() | Get a list of commonly used commands | 
| use <database> | Select the Database to use | 
| db.<collection>.insertOne() | Insert one document to collection | 
| db.<collection>.insertMany([]) | Insert many documents in an array to the collection | 
| db.<collection>.find() | Find Document or Document from a collection | 
| it | Iterator to iterate over result | 
| db.<collection>.replaceOne({<filter>, <replacment document> }) | Find Document or Document from a collection | 
| db.<collection>.find(<query>).sort(<sort>) | Find a document and sort the return values | 
| db.<collection>.find(<query>).limit(<number>) | Limit the queries returned | 
| db.<collection>.countDocuments(<filter>, <options>) | Count the number of documents in a collection | 
| db.<collection>.countDocuments({ items: { $elemMatch: { name: "laptop", price: { $lt: 600 }}}}) | Count Documents with Element Match | 
| db.<collection>.createIndex({ email:1 }, {unique:true}) | Create an index with unique field using email | 
| db.<collection>.getIndexes() | Return the indexes that were created | 
| db.<collection>.explain().find({ email: "test@test.com" }) | Return the query plan for the query | 
| db.<colleciton>.hideIndex(<index>) | Hide a index before deleting it, unhiding an index if faster than | 
| db.<colleciton>.hideIndex('active_1_birthday_-1_name_1') | Hide an Index - Scenario 1 | 
| db.<colleciton>.hideIndex({active:1, bitrthday:-1, name:1 }) | Hide a Index - Scenario 2 | 
| db.<colleciton>.dropIndex(<index>) | Drop Index | 
| db.<colleciton>.dropIndex(<index>) | Drop Indexes | 
| db.<colleciton>.dropIndex('active_1_birthday_-1_name_1') | Hide a index before deleting it, unhiding an index if faster than | 
| db.<colleciton>.dropIndex({active:1, bitrthday:-1, name:1 }) | Hide a index before deleting it, unhiding an index if faster than | 
MongoDB Operators
| Operator | Example | Description | 
|---|---|---|
| { <field>: <value> } | { age: 11 } | Field Equals | 
| $eq | { age: { $eq: 11 }} | Field Equals using $eq | 
| $in | { city: { $in: [ "PHOENIX", "CHICAGO" ]}} | Find value that are in an array | 
| $gt | { "items.price": { $gt: 50 }} | Greater than | 
| $lt | { "items.price": { $lt: 50 }} | Less than | 
| $lte | { "items.price": { $lte: 50 }} | Less than or Equal to | 
| $gte | { "items.price": { $gte: 50 }} | Greater than or Equal to | 
| $elemMatch | { product: { $elemMatch: { $eq: "InvestmentStock" }}} | Looks up a single value or a value in an array | 
| $elemMatch | { items: { $elemMatch: { name: "Laptop", price: { $gt: 800 }, quantity: { $gte: 1 }}}} | Looks up a several values | 
| $and | { $and: [{ <field>: <value>, <field>: <value> }]} | Looks items with two or more statement that must evaluate to true to return values | 
| $or | { $or: [{ <field>: <value>, <field>: <value> }]} | Looks items with two or more statement that must evaluate to true to return values | 
| $set | `` | Adds new fields and values to document, or replaces the value of a field with a specific value | 
| $push | `` | Appends a value to an array, If absent, $pushadds the array field with the value as its element | 
Examples Operator Searches
MongoDB Atlas Commands
| Command | Description | 
|---|---|
| atlas auth login | Login to Atlas cli | 
| atlas auth logout | Logout of Atlas cli | 
| atlas auth whoami | Display who I am logged in as | 
| atlas auth register | Register with MongoDB Atlas | 
| atlas config set | Change the config file for the Atlas cli | 
| atlas setup --clusterName myAtlasClusterEDU --provider AWS --currentIp --skipSampleData --username myAtlasDBUser --password password --projectId project_id | tee atlas_cluster_details.txt | Setup Cluster from Atlas CLI | 
| atlas clusters sampleData load myAtlasClusterEDU | Load Atlas sampleData to your cluster | 
MongoDB Data Types
| Data Type | Data Example | 
|---|---|
| ObjectId | "_id": 2 | 
| Int32 | "value": Int32(1) | 
| Double | "value": Double(1) | 
Data Modelling
*** Guiding Prinicples for Data ***
- Data that is accessed together should be stored together - Avoid Searching multiple collection
- Structure your data to match the way that your application queries and updates it
Questions:
- What does my application do?
- What data will I store?
- How will users access this data?
- What data will be the most valuable to me?
Answers will determine the following:
- Yours tasks as well as those of the users
- What your data looks like
- The relationship among the data
- The tooling you plan to have
- The access patterns that might emerge
Benefits of a Data Model
- Make it easier to manage data
- Make queries more efficient
- Use less memory and CPU
- Reduce Costs
Relationship Types
- One to One
- One to Many
- Many to Many
| Type | Description | 
|---|---|
| One to One | A relationship where a data entity in one set is connection to exactly one date entity in another set. | 
| One to Many | A relationship where a data entity in one set is connected to any number of data entities in another set | 
| Many to Many | A relation ship where any number of data entities in one set are connected to any number of data entities in another set. | 
Way to Model Relationships
- Embedding
- Referencing
| Type | Description | 
|---|---|
| Embedding | We take related data and insert it into our document | 
| Referencing | We refer to documents in another collection in our document by ObjectId | 
Embedded Documents
Use Cases:
- Ideal for one-to-many and many-to-many relationships among data
Benefits of Embedding:
- Avoids application joins
- Provides better performance for read operations
- Allows developers to update related data in a single write operation
Downsides of Embedding:
- Embedding data into a single document can create large datasets - Large documents have to be read into memory in full, which can result in slow application performance for your end user.
- Continously adding without limit creates unbounded documents - Unbounded documents may exceed the BSON document threshold of 16 MB
*** See Schema Patterns ***
Referencing Documents
Using references is called linking or data normalisation
Use Cases:
- Ideal for information that needs to be stored in different collection but may need to be retrived together.
Benefits of Referencing:
- No duplication of data
- Smaller doucments
Downsides of Referencing:
- Querying from multiple documents costs extra resources and impacts read performance
Schema Anti-Patterns
Common Schema Anit-Patterns include:
- Massive Arrays
- Massive Number of Collections
- Bloated Documents
- Unnecessary Indexes
- Queries without Indexes
- Data that is accessed together, but is stored in different collections
Tools to identify Anti-Patterns in MongoDB Atlas:
- Data Explorer 
- Available with Free Tier Atlas 
- Shows schema anti-patterns 
- Collections and index stats for each collection 
- Performance Advisor 
- Available with M10 tier or higher DB Cluster 
- Tool analyzes the most active collections 
- Recommends schmea improvements 
MongoDB Connection Strings
Connection Strings can be used to connect from the following:
- Shell
- MongoDB Compass (Or Other Clients)
- and the Application
Connection Strings come in two formats
| Format | Description | 
|---|---|
| Standard | Used to connect to standalone clusters, Replica set or Shared Clusters | 
| DNS Seed list | Provides a DNS server list to our connection string | 
DNS Seed List Benefits:
- Gives more flexibility of deployment
- Ability to change ser vers in rotation without reconfiguring clients
Connecting with Mongo Shell
MongoDB shell is a nodeJS repl environment giving us access to Javascript:
- Variables
- Functions
- Conditions
- Loops
- Control flow statements
❯ mongosh "mongodb+srv://myatlasclusteredu.za9nd.mongodb.net/" --apiVersion 1 --username myAtlasDBUser
Enter password: ********
Current Mongosh Log ID:	673aad59ff831302936ee88f
Connecting to:		mongodb+srv://<credentials>@myatlasclusteredu.za9nd.mongodb.net/?appName=mongosh+2.3.3
Using MongoDB:		7.0.15 (API Version 1)
Using Mongosh:		2.3.3Examples Commands
// Create an array
const greetArray = [ "hello", "world", "welcome" ];
// Create an arrow function array
const loopArray = (array) => array.forEach(el => console.log(el));
// Pass greetArray to loopArray funnction
loopArray(greetArray)Atlas atlas-xp0h5c-shard-0 [primary] test> const greetArray = [ "hello", "world", "welcome" ];
Atlas atlas-xp0h5c-shard-0 [primary] test> const loopArray = (array) => array.forEach(el => console.log(el));
Atlas atlas-xp0h5c-shard-0 [primary] test> loopArray(greetArray)
hello
world
welcome
Atlas atlas-xp0h5c-shard-0 [primary] test>Connecting with Mongo Compass
Features:
- Documentstab to see Records in Collections
- Aggregationstab can compose aggregations to run against collections
- Schematab helps to analyse and optomise the structure of the documents
- Explain Plantab help to analyse the performance of the specific queries that are running against your collections
- Indexestab should the indexes that are available in your collections
- Validationtab helps use to enforce data structure of documents on update and inserts
Connect with MongoDB Drivers
(MongoDB Programming Drivers)[https://www.mongodb.com/docs/drivers/]
Best Practices:
- An Application should use a single MongoClient instances for all database requests
- Creating MongoClients is resource intensive, so these need to be minimised
- Creating a new MongoClient for each request will degrade the application’s performance
Benefits of BSON
- Extension of JSON
- Optimized for storage, retrival, and transmission across the wire
- More secure than plaintext JSON
- Support more data types than JSON
Benefits of Python Driver for BSON
- Represents BSON documents as Python dictionaries
- You can work with native Python data types
- PyMongo automatically converts Python data types to and from BSON
- Using PyMongo we can work with native Python data types like string, floats, lists and dictionaries
Be aware of:
- To work with the BSON ObjectId data type in Python, use the bsonpackage in PyMongo -bson.objectid.ObjectId
- A few data types need to use the bsonpackage to work with BSON types in Python, including:- Int64
- Decimal128
- Regex
 
MongoDB Transactions
- Define the callback function that specifies the sequence of operation to perform inside the transaction.
- Start a client session.
- Start the transaction by calling the with_transaction() method.
Note: MongoDB will automatically cancel any multi-document transaction that runs for more than 60 seconds
MongoDB Aggregation
- Aggregationis the collection, analysis and summarisation of data
- Stagean aggregation operation performed on the data (one or more stages)
- Aggregation Pipelinea series of stages completed one at a time in order
*** Note *** In an Aggregation Pipeline data can be Filtered, Sorted, Grouped and Transformed
| Stage | Description | 
|---|---|
| $match | Filters for data that matches criteria | 
| $group | Groups documents based on criteria | 
| $sort | Puts the documents in a specified order | 
| $limit | Limit the returned results | 
| $project | Include or Exclude fields from the results | 
| $set | Adds or modifies fields in the pipeline | 
| $count | Counts documents in the pipeline, return the total documented counted | 
| $out | Writes the documents that are returned by an aggregation pipeline into a collection (must be the last stage in the pipeline). | 
Note: $out creates a new collection if it doesn't already exists and if it does exists $out replaces the existing collection with new data
Aggregation Examples
Example: Get the number of zip codes in each Californian cities (Match and Sort)
db.zips.aggregate([
  {
    $match: { 
      "state": "CA" }
  },
  {
    $group: {
      _id: "$city",
      totalZips: { $count: {} }
    }
  }
])Example: Get the number of sighting for Eastern Bluebirds by Location Coordinates (Match and Sort)
db.sightings.aggregate([
  {
    $match: { 
      "species_common": "Eastern Bluebird" 
    }
  },
  {
    $group: {
      _id: "location.coordinates",
      sightings: { $count: {} }
    }
  }
])Example: Get sort the population size in descending order and return the top three results (Sort and Limit)
db.zips.aggregate([
  {
    $sort: { 
      pop: -1
    }
  },
  {
    $limit: 3
  }
])Example: Get sort the birds sightings in descending order and return the top four results (Sort and Limit)
db.sightings.aggregate([
  {
    $sort: {
      "location.coordinates.1": -1
    }
  },
  {
    $limit: 4
  }
])Example: Return results based on projections, return fields state, zip, population and exclude _id
db.zips.aggregate([
  {
    $project: {
      state: 1,
      zip: 1,
      population: "$pop",
      _id: 0 
    }
  }
])Example: Add a field based on projected population group
db.zips.aggregate([
  {
    $set: {
      pop_2022: { $round { $multiply: [ 1.0031, "$pop" ] } }
    }
  }
])Example: Cound the total number of Zip code in the collection
db.zips.aggregate([
  {
    $count: "total_zips"
  }
])Example: Project only the field date and species_common
db.sightings.aggregate([
  {
    $project: {
      date: 1,
      species_common: 1,
      "_id": 0
    }
  }
])Example: Create a new field called class and set the value to “birds”
db.birds.aggregate([
  {
    $set: {
      class: "birds"
    }
  }
])Example: Create a new field called bluebird_sightings_2022:
db.sightings.aggregate([
  {
    $match: {
      "species_common": "Eastern Bluebird",
      "date": { 
        $gt:ISODate('2022-01-01'), 
        $lt:ISODate('2023-01-01')}
    }
  },
  {
    $count: "bluebird_sightings_2022"
  }
])Example: Create new documents in a collection
db.sightings.aggregate([
  {
    $match: {
      date: {
        $gte: ISODate('2022-01-01'), 
        $lt: ISODate('2023-01-01')
      }
    }
  },
  {
    $out: "sightings_2022"
  }
])Example: Create new documents in a collection (default database)
db.sightings.aggregate([
  {
    $out: {
      coll: test
    }
  }
])MongoDB Indexes
What are indexes?
- Special data structures
- Store small portion of the data
- Ordered and easy to search efficiently
- Points to the document idenitifier
Benefits of Indexes:
- Speed up queries
- Reduce disk I/O
- Reduce resources required
- Support equality matches and range-based operation
- Returns sorted results
Disadvantages of an Index
- Come with a write performance cost
Without Indexes
- MongoDB reads all documents (collection scan - inefficient)
- Sorts results in memory
With Indexes:
- MongoDB only fetches the documents identified by the index based on the query
By Default:
- There is one default index per collection, which includes only the _id field
- Every query should use an index
Types of Indexes:
- Single field - Indexes on one field only
- Compound - More than one field included in the index
*** Note *** Indexes that operate on an array field are known as Multi-Key indexes