Skip to content

MongoDB - Importing Dataset(s)

Data is imported to MongoDB databases by using the mongoimport command. More about the command and usage examples is found from the Official MongoDB Documentation.

Also – Data from your database can be exported using the mongoexport command.

Example: Movielens CSV Database

Data imported to MongoDB database can be in JSON, CSV, or TSV format. In this example, we will import a small MovieLens database which will contain movies, reviews of the movies, links to imdb pages, and tags of movies.

The database is found from the grouplens.org website. We will use the small database named ml-latest-small.zip, which is 1MB.

Info

At this point, make sure that you are connected to your databases in docker container (using the ./start command) and are not in inside the mongosh shell.

Downloading the database

In your terminal, make sure first that you are currently not using mongosh. You should be inside your databases in docker container. After that, run this command to download the file:

Terminal
curl https://files.grouplens.org/datasets/movielens/ml-latest-small.zip --output ml-latest-small.zip

Now, the file is a zip archive. If you do not have unzip installed, you need to first install it:

Terminal
sudo apt update
sudo apt install unzip

After this, you can extract the archive:

Terminal
unzip ml-latest-small.zip

Which creates a folder called ml-latest-zip.

Now, navigate to the folder ml-latest-zip:

Terminal
cd ml-latest-small

Importing CSV Files

Now, when running the ls in the folder, we can see that we have the movies data in CSV files:

Example listing
README.txt  links.csv  movies.csv  ratings.csv  tags.csv

We need next to import all these files one by one to a database contained in our MongoDB server.

To do so, run these commands one by one:

Terminal
mongoimport --type=csv --db=movies --collection=links --file=links.csv --headerline
mongoimport --type=csv --db=movies --collection=movies --file=movies.csv --headerline
mongoimport --type=csv --db=movies --collection=ratings --file=ratings.csv --headerline
mongoimport --type=csv --db=movies --collection=tags --file=tags.csv --headerline

The commands above should import all these 4 files into a database called movies.

Reviewing the Data in MongoDB

Now, start mongosh:

Terminal
mongosh

And after that list all databases to verify that you have movies database:

Text Only
show dbs;
Example Output
admin       40.00 KiB
analytics  184.00 KiB
config     108.00 KiB
local       72.00 KiB
movies       4.14 MiB

And finally, use the database movies:

MongoDB
use movies;

And after that, verify that you have all the required collections:

MongoDB
show collections;
Example Output
links
movies
ratings
tags

Great!

Finding Data

To start having a look at what items we have in our collections, we could list 3 first items from all of our tables, like this:

MongoDB
db.movies.find({}).limit(3);
Example Output
[
  {
    _id: ObjectId("633d60d15fb45b6e341a683a"),
    movieId: 1,
    title: 'Toy Story (1995)',
    genres: 'Adventure|Animation|Children|Comedy|Fantasy'
  },
  {
    _id: ObjectId("633d60d15fb45b6e341a683b"),
    movieId: 2,
    title: 'Jumanji (1995)',
    genres: 'Adventure|Children|Fantasy'
  },
  {
    _id: ObjectId("633d60d15fb45b6e341a683c"),
    movieId: 3,
    title: 'Grumpier Old Men (1995)',
    genres: 'Comedy|Romance'
  }
]

In the command above, just replace movies with links, ratings or tags.

To view the amount of entries in each table, we can run .count:

MongoDB
db.movies.find({}).count();

Ratings for a Movie

Now, from the example above, notice that each movie has their own unique movieId. By using this movieId, we could check how many ratings and what kind of ratings have been written for each movie. For example, a movie named Toy Story (1995) has movieId of 1, so we could do:

MongoDB
db.ratings.find({ movieId: 1 }).count();
db.ratings.find({ movieId: 1 });