Quizz 03: Deep Learning Intro and Neural LMs

Name:

This quizz covers material from the third lecture:

In machine learning, we are given a dataset of the form {(x_i, y_i)}, i ∈ [1..N] and aim at learning a function f(x) which maps unseen input feature vectors to ŷ - the predicted value. Distinguish between the 3 types of learning problems by characterizing the mathematical type of the predicted values ŷ:
Classification:

Regression:

Ranking:
Given a training dataset D = {(x_i, y_i)}, i ∈ [1..N], we want to identify a function f_Θ() such that the predictions ŷ = f_Θ(x) over the training dataset are as accurate as possible. Given a Loss Function L(y,ŷ) - write the criterion that the optimal value of Θ must satisfy:
Find θ such that:
Write the expression of the cross-entropy loss which is useful when the predicted output of the model we learn is interpreted as a discrete distribution p(y_c|x) for c ∈ [1..C] (C-way classification model). f(x) = ŷ = (ŷ₁ ... ŷ_C) is a distribution over the C possible classes.

L(ŷ,y) =
The deep learning approach learns a trainable non-linear mapping function φ from x to a representation φ(x) which can be used as an input to a linearly separable classification problem. The general form of this trainable mapping we consider is: ŷ = W φ(x) + b φ(x) = g(W'x + b') where g is a non-linear function. Why do we need non-linear mappings such as g() in this formulation?
Consider the task of predicting the sentiment of a text document as either {positive, negative, neutral}. We want to use a neural network to learn a model for this task, given a training dataset of the form { (document_i, label_i) } i in [1..N]. Each document d_i contains N_i words (w_i,1, ..., w_{i, N_i}), where the words w_i,j belong to the vocabulary V = { w₁, ..., w_|V| }. Describe how the documents are encoded as vectors of size |V| for each of the following two methods:
Bag of Words

Tf-Idf weights

Last modified 24 Nov 2019