MongoDB - Importing Dataset(s)
Data is imported to MongoDB databases by using the mongoimport command. More about the command and usage examples is found from the Official MongoDB Documentation.
Also – Data from your database can be exported using the mongoexport command.
Example: Movielens CSV Database
Data imported to MongoDB database can be in JSON, CSV, or TSV format. In this example, we will import a small MovieLens database which will contain movies, reviews of the movies, links to imdb pages, and tags of movies.
The database is found from the grouplens.org website. We will use the small database named ml-latest-small.zip, which is 1MB.
Info
At this point, make sure that you are connected to your databases in docker container (using the ./start command) and are not in inside the mongosh shell.
Downloading the database
In your terminal, make sure first that you are currently not using mongosh. You should be inside your databases in docker container. After that, run this command to download the file:
curl https://files.grouplens.org/datasets/movielens/ml-latest-small.zip --output ml-latest-small.zip
Now, the file is a zip archive. If you do not have unzip installed, you need to first install it:
After this, you can extract the archive:
Which creates a folder called ml-latest-zip.
Now, navigate to the folder ml-latest-zip:
Importing CSV Files
Now, when running the ls in the folder, we can see that we have the movies data in CSV files:
We need next to import all these files one by one to a database contained in our MongoDB server.
To do so, run these commands one by one:
mongoimport --type=csv --db=movies --collection=links --file=links.csv --headerline
mongoimport --type=csv --db=movies --collection=movies --file=movies.csv --headerline
mongoimport --type=csv --db=movies --collection=ratings --file=ratings.csv --headerline
mongoimport --type=csv --db=movies --collection=tags --file=tags.csv --headerline
The commands above should import all these 4 files into a database called movies.
Reviewing the Data in MongoDB
Now, start mongosh:
And after that list all databases to verify that you have movies database:
Example Output
And finally, use the database movies:
And after that, verify that you have all the required collections:
Great!
Finding Data
To start having a look at what items we have in our collections, we could list 3 first items from all of our tables, like this:
Example Output
[
{
_id: ObjectId("633d60d15fb45b6e341a683a"),
movieId: 1,
title: 'Toy Story (1995)',
genres: 'Adventure|Animation|Children|Comedy|Fantasy'
},
{
_id: ObjectId("633d60d15fb45b6e341a683b"),
movieId: 2,
title: 'Jumanji (1995)',
genres: 'Adventure|Children|Fantasy'
},
{
_id: ObjectId("633d60d15fb45b6e341a683c"),
movieId: 3,
title: 'Grumpier Old Men (1995)',
genres: 'Comedy|Romance'
}
]
In the command above, just replace movies with links, ratings or tags.
To view the amount of entries in each table, we can run .count:
Ratings for a Movie
Now, from the example above, notice that each movie has their own unique movieId. By using this movieId, we could check how many ratings and what kind of ratings have been written for each movie. For example, a movie named Toy Story (1995) has movieId of 1, so we could do: