import torch.nn as nn
import torch
from torch.autograd.variable import Variable
from torchvision import datasets, models, transforms
model = models.resnet18(pretrained = False)
Let us first explore this model's layers and then make a decision as to which ones we want to freeze. By freeze we mean that we want the parameters of those layers to be fixed. When fine tuning a model, we are basically taking a model trained on Dataset A, and then training it on a new Dataset B. We could potentially start the training from scratch as well, but it would be like re-inventing the wheel. Let me explain why.
Suppose, I want to train a dataset to learn to differentiate between a car and a bicycle. Now, I could potentially gather images of both categories and train a network from scratch. But, given the majority of work already out there, it's easy to find a model trained to identify things like Dogs, cats, and humans. Admittedly, neither of these 3 look like cars or bicycles. However, it's still better than nothing. We could start by taking this model, and train it to learn car v/s bicycle. Gains : 1) It will be faster, 2) We need lesser images of cats and bicycles.
(If interested in knowing more, read this - http://cs231n.github.io/transfer-learning/).
Now, let's take a look at the contents of a resnet18. We use the function .children() for this purpose. This lets us look at the contents/layers of a model. Then, we use the .parameters() function to access the parameters/weights of any layer. Finally, every parameter has a property .requires_grad which defines whether a parameter is trained or frozen. By default it is True, and the network updates it in every iteration. If it is set to False, then it is not updated and is said to be "frozen".
child_counter = 0
for child in model.children():
print(" child", child_counter, "is -")
print(child)
child_counter += 1
Now, you can see that some of the children are actually big chunks and have layers within them. To access one level deeper we can run .children() on a child object as well!
Let's saw we want to freeze all parameters up to first BasicBlock of Child 6. First, lets see a parameter and set it to frozen -
for child in model.children():
for param in child.parameters():
print("This is what a parameter looks like - \n",param)
break
break
Evidently, training this will take a lot of calculations. So, by setting a bunch of these to frozen, training becomes much faster. Now, let's freeze up to first BasicBlock of Child 6
child_counter = 0
for child in model.children():
if child_counter < 6:
print("child ",child_counter," was frozen")
for param in child.parameters():
param.requires_grad = False
elif child_counter == 6:
children_of_child_counter = 0
for children_of_child in child.children():
if children_of_child_counter < 1:
for param in children_of_child.parameters():
param.requires_grad = False
print('child ', children_of_child_counter, 'of child',child_counter,' was frozen')
else:
print('child ', children_of_child_counter, 'of child',child_counter,' was not frozen')
children_of_child_counter += 1
else:
print("child ",child_counter," was not frozen")
child_counter += 1
Now that you have frozen this network, another thing changes to make this work. That is your optimizer. Your optimizer is the one which actually updates these values. By default, the models are written like this -
optimizer = torch.optim.RMSprop(model.parameters(), lr=0.1)
But, this will give you an error as this will try to update all the parameters of model. However, you've set a bunch of them to frozen! So, the way to pass only the ones still being updated is -
optimizer = torch.optim.RMSprop(filter(lambda p: p.requires_grad, model.parameters()), lr=0.1)
There's 2 primary ways in which models are saved in PyTorch. The suggested one is using "state dictionaries". They're faster and requires lower space. Basically, they have no idea of the model structure, they're just the values of the parameters/weights. So, you must create your model with the required architecture and then load the values. The architecture is declared as we did it above.
# Let's assume we will save/load from a path MODEL_PATH
# Saving a Model
torch.save(model.state_dict(), MODEL_PATH)
# Loading the model.
# First create a model and define it's architecture as done above in this notebook. If you want a custom architecture.
# read below it's been covered below.
checkpoint = torch.load(MODEL_PATH)
model.load_state_dict(checkpoint)
Most people who come to pytorch don't like the fact that they can't do a .pop() to remove last layer. Especially if they've used Keras. So, let's take a look at how these things can be done.
# Load the model
model = models.resnet18(pretrained = False)
# Get number of parameters going in to the last layer. we need this to change the final layer.
num_final_in = model.fc.in_features
# The final layer of the model is model.fc so we can basically just overwrite it
#to have the output = number of classes we need. Say, 300 classes.
NUM_CLASSES = 300
model.fc = nn.Linear(num_final_in, NUM_CLASSES)
# Load the model
model = models.resnet18(pretrained = False)
We can get the layers by using model.children() as before. Then, we can convert this into a list by using a list() command on it. Then, we can remove the last layer by indexing the list. Finally, we can use the PyTorch function nn.Sequential() to stack this modified list together into a new model. You can edit the list in any way you want. That is, you can delete the last 2 layers if you want the features of an image from the 3rd last layer!
You may even delete layers from the middle of the model. But obviously, this would lead to incorrect number of features going in to the layer after it as most layers change size of image. In this case, you can index that specific layer of the model and overwrite it just as I showed you immediately above!
new_model = nn.Sequential(*list(model.children())[:-1])
new_model_2_removed = nn.Sequential(*list(model.children())[:-2])
Say, you want to add a fully connected layer to the model we have right now. One obvious way would be to edit the list I discussed above and appending to it another layer. However, often times we have such a model trained and want to see if we can load that model, and add just a new layer on top of it. As mentioned above, the loaded model should have the SAME architecture as saved one, and so we can't use the list method.
We need to add layers on top. The way to do this is simple in PyTorch - We just need to create a custom model! And this brings us to our next section - creating custom models!
Let's make a custom model. As mentioned above, we will load half of the model from a pre-trained network. This seems complicated, right? Half the model is trained, half is new. Further, we want some of it to be frozen. Some to be update-able. Really, once you've done this, you can do anything with model architectures in PyTorch.
# Some imports first
import torch.nn as nn
import math
import torch.utils.model_zoo as model_zoo
import torch
from torch.autograd.variable import Variable
from torchvision import datasets, models, transforms
# New models are defined as classes. Then, when we want to create a model we create an object instantiating this class.
class Resnet_Added_Layers_Half_Frozen(nn.Module):
def __init__(self,LOAD_VIS_URL=None):
super(ResnetCombinedFull2, self).__init__()
# Start with half the resnet model, swap out the final layer because that's the model we had defined above.
model = models.resnet18(pretrained = False)
num_final_in = model.fc.in_features
model.fc = nn.Linear(num_final_in, 300)
# Now that the architecture is defined same as above, let's load the model we would have trained above.
checkpoint = torch.load(MODEL_PATH)
model.load_state_dict(checkpoint)
# Let's freeze the same as above. Same code as above without the print statements
child_counter = 0
for child in model.children():
if child_counter < 6:
for param in child.parameters():
param.requires_grad = False
elif child_counter == 6:
children_of_child_counter = 0
for children_of_child in child.children():
if children_of_child_counter < 1:
for param in children_of_child.parameters():
param.requires_grad = False
else:
children_of_child_counter += 1
else:
print("child ",child_counter," was not frozen")
child_counter += 1
# Now, let's define new layers that we want to add on top.
# Basically, these are just objects we define here. The "adding on top" is defined by the forward()
# function which decides the flow of the input data into the model.
# NOTE - Even the above model needs to be passed to self.
self.vismodel = nn.Sequential(*list(model.children()))
self.projective = nn.Linear(512,400)
self.nonlinearity = nn.ReLU(inplace=True)
self.projective2 = nn.Linear(400,300)
# The forward function defines the flow of the input data and thus decides which layer/chunk goes on top of what.
def forward(self,x):
x = self.vismodel(x)
x = torch.squeeze(x)
x = self.projective(x)
x = self.nonlinearity(x)
x = self.projective2(x)
return x
Now that we have our model all in place we can load anything and create any architecture we want. That leaves us with 2 important components in any pipeline - Loading the data, and the training part. Let's take a look at the training part. The two most important components of this step are the optimizer and the loss function. The loss function quantifies how far our existing model is from where we want to be, and the optimizer decides how to update parameters such that we can minimize the loss.
Sometimes, we need to define our own loss functions. And here are a few things to know about this -
Here I show a custom loss called Regress_Loss which takes as input 2 kinds of input x and y. Then it reshapes x to be similar to y and finally returns the loss by calculating L2 difference between reshaped x and y. This is a standard thing you'll run across very often in training networks.
Consider x to be shape (5,10) and y to be shape (5,5,10). So, we need to add a dimension to x, then repeat it along the added dimension to match the dimension of y. Then, (x-y) will be the shape (5,5,10). We will have to add over all three dimensions i.e. three torch.sum() to get a scalar.
class Regress_Loss(torch.nn.Module):
def __init__(self):
super(Regress_Loss,self).__init__()
def forward(self,x,y):
y_shape = y.size()[1]
x_added_dim = x.unsqueeze(1)
x_stacked_along_dimension1 = x_added_dim.repeat(1,NUM_WORDS,1)
diff = torch.sum((y - x_stacked_along_dimension1)**2,2)
totloss = torch.sum(torch.sum(torch.sum(diff)))
return totloss