3 different ways of saving data in non-relational databases & the effect of each on query performance — Part 1 of 3

Published in

Towards Dev

6 min readFeb 2, 2022

Prerequisite: This article assumes you have some basic knowledge about node.js applications & databases (both relational & non-relational).

Working examples use mongodb and the mongoose library as an ODM.

Part 1: Using normalization

Part 2: Using embedded documents

Part 3: Using a hybrid approach

There are three methods of storing data in non-relational databases, namely using normalization, using embedded documents & using a hybrid approach-a mix of the first two methods.

How you choose to save data influences which side of the data-consistency V query performance equation you lean.

Data consistency according to Oracle docs means that each user sees a consistent view of data including visible changes made by the user’s own transactions & transactions of other users.

Put simply, if user A & B perform the same query on the same database, they should both receive the same data and if user B updates this data, user A should upon request get the updated copy of this data.

To ensure data consistency, concepts such as database transactions, normalization etc were created to manage & organise connected data in databases.

In this article we will discuss the concept of normalization.

Normalization simply means don’t duplicate data across your database.

If the same data is required multiple times across the database, store this data in a different collection, give it a unique id and use this id as a reference in other areas of your database where this information is required.

This eliminates duplication of data & maintenance problems since data updates are made in only one place.

In relational databases, in a single SQL query, you can join data from different tables therefore, normalization (separation of data) is undoubtedly good for relational databases, however, with non-relational databases, it is not as simple as that.

Normalization might not be so great for non-relational databases.

This might not be the most popular thing to say in 2022 but don’t kill the messenger 😆.

Project setup.

You can access all the code examples from this repository. Files are saved in directories (Embedded, Normalization & Hybrid). Each has a separate data.json and index.js file.

Setup a node.js project and install the mongoose package. This package makes it easy to interact with mongoDB.

This example uses mongoose version 6.1.8 as an ODM library

Check if you have a local installation of mongodb. If not, please download and install mongodb. You can follow this article for guidance.

Consider this scenario as our example going forward. An e-commerce platform stores records of its users, purchases and products. We are tasked with designing a non-relational database.

Normalization will guarantee data consistency, however, as your queries grow in complexity, it will have a negative impact on query performance.

Given our context, we can say that users & products are distinct types and each purchase is a composition of users and products. Let’s look at an example.

Create models of each category. These are representations of the data expected in each collection.

// User model
const mongoose = require('mongoose')

module.exports = mongoose.model('User', mongoose.Schema({
    name: {
        type: String,
        trim: true,
        required: true
    },
    address: {
        type: String,
        required: true,
        trim: true
    },
    category: {
        type: String,
        enum: ['Platinum', 'Gold', 'Silver', 'Bronze'],
        default: 'Bronze'
    }
}))

// Product model
const mongoose = require('mongoose')

module.exports = mongoose.model('Product', mongoose.Schema({
    name: {
        type: String,
        trim: true,
        required: true
    },
    category: {
        type: String,
        required: true,
        trim: true,
        enum: ['Electronics', 'Toys', 'Kitchen']
    },
    price: {
        type: Number,
        required: true,
        default: 0
    },
    quantity: {
        type: Number,
        required: true,
        min: 0,
        default: 0
    }
}))

// Purchase model
const mongoose = require('mongoose')

module.exports = mongoose.model('Purchase', mongoose.Schema({
    buyer: {
        type: mongoose.Schema.Types.ObjectId,
        required: true,
        ref: 'User'
    },
    product: {
        type: mongoose.Schema.Types.ObjectId,
        required: true,
        ref: 'Product'
    },
    quantity: {
        type: Number,
        required: true,
        min: 1
    }
}))

Notice the two properties buyer and product are of type ObjectId. When a purchase is made, instead of saving all the details about the registered user or the product, we will extract the respective ids & save them.

With this approach, all our information about products is saved in one collection and users in another. We use ids to map purchases to buyers and products.

The ref option is what tells mongoose which collection is holding the information about a particular field & mongoose will use this path to populate a query.

Let’s create some data.

const mongoose = require('mongoose')
const data = require('./data.json')
const User = require('./models/user')
const Product = require('./models/product')
const Purchase = require ('./models/purchase')

mongoose.connect('mongodb://localhost:27017/normalize')

mongoose.connection
    .on('open', () => {
      console.log('Mongoose connection open');
    })
    .on('error', (error) => {
      console.log(`Connection error: ${error.message}`);
    });

// Add users to DB.
async function createUsers(users) {
    for(let user of users) {
        try {
            const _user = await User(user).save()
            console.log(`${ _user.name } added to users collection`)
        } catch(error) {
            console.log(`Error: ${ error.message }`)
        }
    }
}

// Add products to DB.
async function createProducts(products) {
    for (let product of products) {
        try {
            const _product = await Product(product).save()
            console.log(`${ _product.name } product added to products collection`)
        } catch(error) {
            console.log(`Error: ${ error.message }`)
        }
    }
}

// Add product order to DB.
async function createPurchases(purchases) {
    for(let purchase of purchases) {
        try {
            const _user = await User.findOne({ name: purchase.user })
            const _product = await Product.findOne({ name: purchase.product })

            await Purchase({
                buyer: _user._id,
                product: _product._id,
                quantity: purchase.quantity
            }).save()

            console.log(`${_product.name} product ordered by ${_user.name}`)

        } catch(error) {
            console.log(`Error: ${ error.message }`)
        }
    }
}

// Get unpopulated purchases data object
async function getPurchases() {
    try {
        const result = await Purchase.find().select('-__v').limit(1)
        console.log(result)
    } catch(error) {
        console.log(`Error: ${ error.message }`)
    }
}

// Get populated purchases data object
async function getPopulatedPurchasesData() {
    try {
        const result = await Purchase.find().populate('buyer').populate('product').limit(1)
        console.log(result)
    } catch(error) {
        console.log(`Error: ${ error.message }`)
    }
}

// Generate all sample data.
async function generateSampleData() {
    await createUsers(data.users)
    await createProducts(data.products)
    await createPurchases(data.purchases)
}

// generateSampleData()
// getPurchases()
// getPopulatedPurchasesData()

Step 1.

Scroll to the bottom of the index.js file and uncomment generateSampleData function. Run node index.js. Pay attention to your current folder, if you cloned the repository and are currently in the root directory, then you will run node normalization/index.js

You should see event logs on your terminal, informing you of the users, products and orders added to the database.

Step 2.

Comment out generateSampleData function and uncomment getPurchases function. Run node index.js or node normalization/index.js. You should see data printed on your terminal. Document ids will be different from those in the screen shot but that doesn’t matter.

{
    _id: new ObjectId("63ddfb2af1c0621a61eda7fb"),
    buyer: new ObjectId("63ddfb2af1c0621a61eda7f0"),
    product: new ObjectId("63ddfb2af1c0621a61eda7f7"),
    quantity: 30
}

Your ids will be different from those above but the general structure should be identical.

As you can see above, in the place of our buyer & product, we have ids that map to a user or product in a different collection.

Step 3.

Comment out getPurchases function and uncomment getPopulatedPurchasesData function. Run node index.js or node normalization/index.js. You should see data printed on your terminal.

{
    _id: new ObjectId("63ddfb2af1c0621a61eda7fb"),
    buyer: {
      _id: new ObjectId("63ddfb2af1c0621a61eda7f0"),
      name: 'John Cena',
      address: 'California',
      category: 'Bronze'
    },
    product: {
      _id: new ObjectId("63ddfb2af1c0621a61eda7f7"),
      name: 'Buzz Lightyear',
      category: 'Toys',
      price: 10,
      quantity: 50
    },
    quantity: 30
}

The result of that query is an object with four fields, _id, buyer, product & quantity. However, the buyer & product fields have been populated with information from the User & Product collections respectively.

Advantages of using normalization

Eliminates data redundancy. This means you don’t have duplicate data stored in different locations in your database. In our example, if John Cena makes 100 purchases, his personal information isn’t repeatedly stored for each purchase.
Saves disk space. Because personal information is stored once in the Users collection & we are using an id to make a reference, it saves space unlike if information is duplicated for each purchase.
It is easy to update information. If user John Cena changes address, we only have to edit his information in the user collection. This guarantees data consistency.

Disadvantages of using normalization

Each time populate is called, a new query is started. The query process uses the defined type to search & compare corresponding types in the collection until a match is found.

This is the biggest drawback, the more fields we have to populate the longer the query execution because it has to go through the search, match and return process.

Therefore before you choose to use normalization in non-relational databases, it is important to consider how many fields you have to populate, is this information required at a critical juncture in your user’s journey where speed is paramount?, is there an alternative method to save information that will best meet the required needs? etc, etc.

Next we will discuss how to use embedded documents, look at the advantages it has over normalization and which context it best suits, however, till then thank you for reading, hopefully you will spare time for the next article.

You can find me on linkedIn and medium