"Rare words" on vocabulary

ntonis asked Jan 30, 2021

680 views

I am trying to create a sentiment analysis model and I have a question.

After I preprocessed my tweets and created my vocabulary I've noticed that I have words that appear less than 5 times in my dataset (Also there are many of them that appear 1 time). Many of them are real words and not gibberish. My thinking is that if I keep those words then they will get wrong "sentimental" weights and gonna make my model worse.
Is my thinking right or am I missing something?

My vocab size is around 40000 words and those that are "rare" are around 10k.Should I "sacrifice" them?

Thanks in advance.

ntonis

170 points

Please log in or register to answer this question.

Related questions

0 0 votes

0 0 answers

680

680 views

ntonis asked Jan 30, 2021

680 views

Binary Classification and neutral tag

I am trying to create a sentiment analysis model using binary classification as loss.I have a batch of tweets that some of them are tagged as positive (labeled as 1) and ...

ntonis

170 points

ntonis asked Jan 30, 2021

2 2 votes

1 1 answer

827

827 views

codemonkey asked Oct 16, 2018

827 views

How to perform sentiment analysis in NLP?

If trying to read text and need to finalize texts as good, bad , ugly or any such buckets, where to start? What sentiment functions to use?

codemonkey

140 points

codemonkey asked Oct 16, 2018

5 5 votes

1 answers 1 answer

10.1k

10.1k views

tofighi asked Jun 26, 2019

10,071 views

How to calculate convolutions on a CONV layer for a Convolutional Neural Network?

Assume we have a $5\times5$ px RGB image with 3 channels respectively for R, G, and B. IfR2000012001201021210101020G0212211100002202002002111B0100111201102021011012112 We...

tofighi

116k points

tofighi asked Jun 26, 2019

0 0 votes

0 0 answers

692

692 views

HbibOs asked Jun 21, 2021

692 views

how many samples do we need to test image segmentation using synthetic data ?

Hello,I trained a CNN using synthetic data to perform a segmentation task on human faces. During the test and to evaluate the prediction of this network, I used 200 examp...

HbibOs

120 points

HbibOs asked Jun 21, 2021

1 1 vote

0 0 answers

1.1k

1.1k views

saugata28 asked Jun 8, 2019

1,134 views

What loss function to use in CNN-SVM model

I am using Matlab R2018b and am trying to infuse SVM classifier within CNN. My plan is to use CNN only as a feature extractor and use SVM as the classifier. I know people...

saugata28

130 points

saugata28 asked Jun 8, 2019

"Rare words" on vocabulary

Please log in or register to add a comment.

Please log in or register to answer this question.

0 Answers

Related questions

0 reply

Please log in or register to add a comment.

Please log in or register to answer this question.

0 Answers

Related questions

0