10,482 views
2 2 votes

The hypothesis (model) of Logistic Regression which is a binary classifier ( $y =\{0,1\} $) is given in the equation below:

Hypothesis

$S(z)=P(y=1 | x)=h_{\theta}(x)=\frac{1}{1+\exp \left(-\theta^{\top} x\right)}$

Which calculates probability of Class 1, and by setting a threshold (such as $h_{\theta}(x) > 0.5 $) we can classify to 1, or 0.

Cost function

The cost function for Logistic Regression is defined as below. It is called binary cross entropy loss function:

$J(\theta)=-\frac{1}{m} \sum_{i}^{m}\left(y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\right)$

Iterative updates

Assume we start all the model parameters with a random number (in this case the only model parameters we have are $\theta_j$ and assume we initialized all of them with 1:  for all $\theta_j = 1$ for $j=\{0,1,...,n\}$ and $n$ is the number of features we have)

$\theta_{j_{n e w}} \leftarrow \theta_{j_{o l d}}+\alpha \times \frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)}-\sigma\left(\theta_{j_{o l d}}^{\top}\left(x^{(i)}\right)\right)\right] x_{j}^{(i)}$

Where:
$m =$ number of rows in the training batch
$x^{(i)} = $ the feature vector for sample $i$
$\theta_j = $ the coefficient vector corresponding the features
$y^{(i)} = $ actual class label for sample $i$ in the training batch
$x_{j}^{(i)} = $ the element (column) $j$ in the feature vector for sample $i$
$\alpha =$ the learning rate

Dataset

The training dataset of pass/fail in an exam for 5 students is given in the table below:

If we initialize all the model parameters with 1 (all $\theta_j = 1$), and the learning rate is $\alpha = 0.1$, and if we use batch gradient descent, what will be the:

$a)$ Accuracy of the model at initialization of the train set ($\text{accuracy} = \frac{\text{number of correct classifications}}{\text{all classifications}}$)?
$b)$ Cost at initialization?
$c)$ Cost after 1 epoch?
$d)$ Repeat all $a,b,c$ steps if we use mini-batch gradient descent and $\text{batch size} = 2$

(Hint: For $x_{j}^{(i)}$ when $j=0$ we have $x_{0}^{(i)}  = 1$ for all $i$ )

50% Accept Rate Accepted 31 answers out of 62 questions

2 Answers

3 3 votes

Here is my attempt at the answer. Link to video solution (also includes a small introduction into logistic regression, Goto 13:00 to skip logistic regression explanation.): 

edited by
1 1 vote

Here is my answer for a,b,c &d questions 

Here is my answer for a,b,c &d questions

 

edited by

Related questions

3 3 votes
1 answers 1 answer
7.4k
7.4k views
tofighi asked Mar 18, 2019
7,446 views
The dataset of pass/fail in an exam for 5 students is given in the table below. If we use Logistic Regression as the classifier and assume the model suggested by the opti...
3 3 votes
1 answers 1 answer
7.8k
7.8k views
tofighi asked Feb 3, 2020
7,789 views
How to solve this problem?https://i.imgur.com/8urywpf.jpgQ1) Complete the ? sectionsQ2) Accuracy of system if threshold = 0.5?Q3) Accuracy of system if threshold = 0.95?
2 2 votes
1 answers 1 answer
13.3k
13.3k views
tofighi asked Mar 18, 2019
13,272 views
The dataset of pass/fail in an exam for 5 students is given in the table below. If we use Logistic Regression as the classifier and assume the model suggested by the opti...
3 3 votes
2 answers 2 answers
8.4k
8.4k views
tofighi asked Apr 4, 2019
8,401 views
The scatter plot of Iris Dataset is shown in the figure below. Assume Softmax Regression is used to classify Iris to Setosa, Versicolor, or Viriginica using just petal le...
4 4 votes
1 answers 1 answer
7.6k
7.6k views
tofighi asked Jun 26, 2019
7,617 views
Suppose, you have given the following dataset where x and y are the 2 features and color Red or Blue is the target variable.a) A new data point $x=1$ and $y=1$ is given. ...