MongoDb
dev
web
MongoDB
- No-sql document store
- Stores BSON (Binary json)
Crud
Create
db.collection.insertOne()
db.collection.insertMany()
Update
db.collection.updateOne(<match on>, <set values>)
db.collection.updateMany(<match on>, <set values>)
db.collection.replaceOne(<match on>, <document>)
Read
db.collection.find()<.limit(5)>
Delete
db.collection.deleteOne()
db.collection.deleteMany()
There is also a db.collection.bulkWrite()
which takes an array of changes i.e. [{insertOne: { ... }}, {updateOne: { ... }}]
Indexing
Indexes are data structures that store a small portion of the collections data in an easy to traverse form.
_id
is used as the default index.
- Single field indexes
- Compound indexes
- Can specify 2 or more indexes
- Data is grouped by 1st index -> nth index
- i.e.
CustomerId
,SystemTypeId
,CompanyTypeId
- Can also define a sort order on these
- Multikey indexes
- Keys against an array
- Text indexes
- Good for text search & support text search queries
- A collection can only have one text index
- Wildcard indexes
- Indexing fields that we don’t know or are likely to change
Index notes
- You can hide an index instead of deleting it straight away
- This will stop queries using the index but keep it around so that we can see the impact deleting it will have before actually deleting it
db.restaurants.hideIndex({borough: 1, ratings: 1})
ordb.restaurants.hideIndex("borough_1_ratings_1")
- Unhide works in the same way but is
unhideIndex
rather thanhideIndex
- Indexes should be able to fit in memory
- Measure index use
db.orders.aggregate([{$indexStats: { }}])
- See other indexing stats using
db.collection.stats(<option>)
- Includes index sizes etc
- Another way to see index sizes are:
db.collection.totalIndexSize()
- Measure index use
WiredTiger
https://www.mongodb.com/docs/manual/core/wiredtiger/
- MongoDb storage engine since 3.2
- WiredTiger uses document-level concurrency control for write operations. As a result, multiple clients can modify different documents of a collection at the same time.
- MultiVersion Concurrency Control (MVCC). At the start of an operation, WiredTiger provides a point-in-time snapshot of the data to the operation. A snapshot presents a consistent view of the in-memory data.
- Uses a write-ahead log (i.e. journal) in combination with checkpoints to ensure data durability.
- Persists all data modifications between checkpoints
- Responsible for caching data
- By default, can use a maximum of 0.5*memory-1GB of memory on the in-memory cache. The rest writes to disk
Read Concern/Write Concern/Read Preference
Replication
- Provide redundancy & high availability
- Should be used in production
- Write to primary database & the primary will replicate this onto the secondary sets
- Provides automatic failover
- When there is no heartbeat to the primary server there will be a new primary elected
- Can read from a secondary server if specified
Sharding
https://www.mongodb.com/resources/products/capabilities/database-sharding-explained https://www.mongodb.com/docs/manual/sharding/
- Sharding is a method for distributing data across multiple machines.
- Used for horizontal scaling across servers
- Shard keys
- Single indexed field or compound indexed fields
sh.shardCollection()
Aggregation
https://www.mongodb.com/docs/manual/reference/aggregation/
- Aggregation pipeline builds up a set of stages that process documents.
- Each stage preforms an operation on the input documents (filter, group, calculate)
- Once the stage has been processed the results are passed to the next stage
- Can return results for groups of documents (i.e. counts, calculations etc)
example:
db.orders.aggregate( [
// Stage 1: Filter pizza order documents by pizza size
{
$match: { size: "medium" }
},
// Stage 2: Group remaining documents by pizza name and calculate total quantity
{
$group: { _id: "$name", totalQuantity: { $sum: "$quantity" } }
}
] )