Skip to content

MongoDB - Database Design

There are generally two ways to design databases in MongoDB:

  • Using embedded relationships
  • Using referenced relationships

Let's have a look at both of them and their pros and cos!

Getting Started

Select (use) database testrela:

MongoDB
use testagg

And for this tutorial we will not be creating collections for the items that we store as MongoDB automatically creates the collection whenever the first item is being added in the collection.

1 / 2: Referenced Relationships

The referenced relationship design works similarly to normalized (relational) database tables. You store each entity separately in their own collection. This design approach is not the best suited for document-based databases as it is not how these databases have been designed to store data and can cause major slowdowns when retrieving data.

Let's add some data into our database to see how this design approach works:

MongoDB
// Declare items
db.items.insertMany([
  {
    // 24 characters
    _id: ObjectId("200000000000000000000001"),
    name: "Computer",
    price: 25
  }
])

// Declare users and their relationships to items
db.users_referenced.insertMany([
  {
    // 24 characters
    "_id": ObjectId("100000000000000000000001"),
    name: "Testy mcTester",
    purchases: [
      ObjectId("200000000000000000000001")
    ]
  }
])

Note

Note how in the example above ObjectIds were created manually. This can also be done, but there is a limitation that they need to be exactly 24 characters long.

Note that if we would try to find all the users using the normal find, these related ObjectIds would only show the ObjectId instead of fetching the actual item object:

MongoDB
db.users_referenced.find()

To solve this, the situation gets little bit ugly. We will need to use projection to fetch all the related data. When using the aggregate, we can use $lookup to match any other table to the current table. The syntax for $lookup looks like this:

MongoDB - Syntax Example
$lookup: {
  from: <another collection to join from>,
  localField: <field from the input documents>,
  foreignField: <field from the documents of the "from" collection>,
  as: <output field>
 }

So the final code for finding the users and their purchased items would look like this:

MongoDB
db.users_referenced.aggregate([{
  $lookup: {
    from: "items",
    localField: "purchases",
    foreignField: "_id",
    as: "purchases"
  }
}])

2 / 2: Embedded Approach

Embedded approach is the syntax that is more frequently used with NoSQL databases. The data is not normalized at all and all related data is just being stored inside the collection.

See the example below of this behavior:

MongoDB
// Embedded approach
db.users.insertMany([
  {
    name: "Testy mcTester",
    purchases: [
      {
        name: "Computer",
        price: 25
      }
    ]
  }
])

Notice that all items are now added directly under user purchases. What if the item's name changes? In this case we would need to update the name of the item everywhere where it is stored.

Pros and Cons

Both of these database design options have their own pros and cons. I would highly recommend using the embedded approach when working with NoSQL databases as that is the natural way when using NoSQL databases and that is how NoSQL databases have been designed to work.

This comes to the point of talking on what kind of data should be stored in NoSQL databases. My personal opinion is that any kind of app that has many relationships (like user, user can purchase items, items can belong to categories, categories can have tags...) would be best stored in relational databases instead.

MongoDB is better suited for storing any kind of data where you do not have relationships or do not need to maintain those relationships. So for use-cases like collecting log data, analytics data, doing caching, collecting data from sensors, collecting data from sources from which you do not know the data models...

Of course MongoDB can be used for creating standard apps, but I have witnessed the problem while creating multiple large production apps utilizing MongoDB: the relationships will slow down your system and make all the queries more complicated than they should be.