How to use a pre-defined Tensorflow Dataset?

Tensorflow 2.0 comes with a set of pre-defined ready to use datasets. It is quite easy to use and is often handy when you are just playing around with new models.

In this short post, I will show you how you can use a pre-defined Tensorflow Dataset.

Prerequisite

Make sure that you have tensorflow and tensorflow-datasets installed.

pip install -q tensorflow-datasets tensorflow

Using a Tensorflow dataset

In this example, we will use a small imagenette dataset.

imagenette | TensorFlow Datasets

You can visit this link to get a complete list of available datasets.

Load the dataset

We will use the tfds.builder function to load the dataset.

import tensorflow_datasets as tfds

imagenette_builder = tfds.builder("imagenette/full-size")
imagenette_info = imagenette_builder.info
imagenette_builder.download_and_prepare()
datasets = imagenette.as_dataset(as_supervised=True)

Note:

  • we are setting as_supervised as true so that we can perform some manipulations on the data.
  • we are creating an imagenette_info object that contains the information about the dataset. It prints something like this:

Get split size

We can get the size of the train and validation set using the imagenette_info object.

train_examples = imagenette_info.splits['train'].num_examples
validation_examples = imagenette_info.splits['validation'].num_examples

This would be useful while defining the steps_per_epoch and validation_steps of the model.

Create batches

Next, we will create batches so that the data is easily trainable. On low RAM devices or for large datasets it is usually not possible to load the whole dataset in memory at once.

train, test = datasets['train'], datasets['validation']

train_batch = train.map(
    lambda image, label: (tf.image.resize(image, (448, 448)), label)).shuffle(100).batch(batch_size).repeat()

validation_batch = test.map(
    lambda image, label: (tf.image.resize(image, (448, 448)), label)
).shuffle(100).batch(batch_size).repeat()

Note: We are taking the train and validation splits and resizing all images to 448 x 448 . You can perform any other manipulation too using the map function. It is useful to resize or normalize the image or perform any other preprocessing step.

That’s it. You can now use this data for your model. Here’s the link to the Google Colab with the complete code.

Google Colaboratory


Written on June 18, 2020 by Vivek Maskara.

Originally published on Medium

Vivek Maskara
Vivek Maskara
SDE @ Remitly

SDE @ Remitly | Graduated from MS CS @ ASU | Ex-Morgan, Amazon, Zeta

Related