MongoDB - Importing Dataset(s)

Data is imported to MongoDB databases by using the mongoimport command. More about the command and usage examples is found from the Official MongoDB Documentation.

Also – Data from your database can be exported using the mongoexport command.

Example: Movielens CSV Database

Data imported to MongoDB database can be in JSON, CSV, or TSV format. In this example, we will import a small MovieLens database which will contain movies, reviews of the movies, links to imdb pages, and tags of movies.

The database is found from the grouplens.org website. We will use the small database named ml-latest-small.zip, which is 1MB.

Info

At this point, make sure that you are connected to your databases in docker container (using the ./start command) and are not in inside the mongosh shell.

Downloading the database

In your terminal, make sure first that you are currently not using mongosh. You should be inside your databases in docker container. After that, run this command to download the file:

Terminal

curl https://files.grouplens.org/datasets/movielens/ml-latest-small.zip --output ml-latest-small.zip

Now, the file is a zip archive. If you do not have unzip installed, you need to first install it:

Terminal

sudo apt update
sudo apt install unzip

After this, you can extract the archive:

Terminal

unzip ml-latest-small.zip

Which creates a folder called ml-latest-zip.

Now, navigate to the folder ml-latest-zip:

Terminal

cd ml-latest-small

Importing CSV Files

Now, when running the ls in the folder, we can see that we have the movies data in CSV files:

Example listing

README.txt  links.csv  movies.csv  ratings.csv  tags.csv

We need next to import all these files one by one to a database contained in our MongoDB server.

To do so, run these commands one by one:

Terminal

mongoimport --type=csv --db=movies --collection=links --file=links.csv --headerline
mongoimport --type=csv --db=movies --collection=movies --file=movies.csv --headerline
mongoimport --type=csv --db=movies --collection=ratings --file=ratings.csv --headerline
mongoimport --type=csv --db=movies --collection=tags --file=tags.csv --headerline

The commands above should import all these 4 files into a database called movies.

Reviewing the Data in MongoDB

Now, start mongosh:

Terminal

mongosh

And after that list all databases to verify that you have movies database:

Text Only

show dbs;

Example Output

admin       40.00 KiB
analytics  184.00 KiB
config     108.00 KiB
local       72.00 KiB
movies       4.14 MiB

And finally, use the database movies:

MongoDB

use movies;

And after that, verify that you have all the required collections:

MongoDB

show collections;

Example Output

links
movies
ratings
tags

Great!

Finding Data

To start having a look at what items we have in our collections, we could list 3 first items from all of our tables, like this:

MongoDB

db.movies.find({}).limit(3);

Example Output

[
  {
    _id: ObjectId("633d60d15fb45b6e341a683a"),
    movieId: 1,
    title: 'Toy Story (1995)',
    genres: 'Adventure|Animation|Children|Comedy|Fantasy'
  },
  {
    _id: ObjectId("633d60d15fb45b6e341a683b"),
    movieId: 2,
    title: 'Jumanji (1995)',
    genres: 'Adventure|Children|Fantasy'
  },
  {
    _id: ObjectId("633d60d15fb45b6e341a683c"),
    movieId: 3,
    title: 'Grumpier Old Men (1995)',
    genres: 'Comedy|Romance'
  }
]

In the command above, just replace movies with links, ratings or tags.

To view the amount of entries in each table, we can run .count:

MongoDB

db.movies.find({}).count();

Ratings for a Movie

Now, from the example above, notice that each movie has their own unique movieId. By using this movieId, we could check how many ratings and what kind of ratings have been written for each movie. For example, a movie named Toy Story (1995) has movieId of 1, so we could do:

MongoDB

db.ratings.find({ movieId: 1 }).count();
db.ratings.find({ movieId: 1 });