0 0 votes I am trying to create a sentiment analysis model and I have a question. After I preprocessed my tweets and created my vocabulary I've noticed that I have words that appear less than 5 times in my dataset (Also there are many of them that appear 1 time). Many of them are real words and not gibberish. My thinking is that if I keep those words then they will get wrong "sentimental" weights and gonna make my model worse. Is my thinking right or am I missing something? My vocab size is around 40000 words and those that are "rare" are around 10k.Should I "sacrifice" them? Thanks in advance. Deep Learning sentiment-analysis deep-learning nlp + – 0% Accept Rate Accepted 0 answers out of 1 questions ntonis 170 points 4 7 answer comment Share 0 reply Please log in or register to add a comment.