Creating a portfolio of your data science projects is generally a great idea. But as an NLP hiring manager, there is one common portfolio project that I’ve always kind of shrugged at: sentiment analysis. So when I recently analyzed a large data set of job postings and blog articles, and found evidence to confirm my negative sentiment (pun?), I knew I had to share.
What is Sentiment Analysis
Sentiment Analysis is a Natural Language Processing (NLP) task that involves classifying opinions expressed in a piece of text to determine the writer’s attitude towards a particular topic, product, service, event, or even person. The task is generally framed as a multi-class problem with three labels: positive, negative, or neutral.
Techniques used for sentiment analysis can vary widely and may involve classical ML such as Bag of Words (e.g. TFIDF with Logistic Regression), deep learning such as a transformer-based model (e.g. BERT), or lexico-semantic pattern analysis.
The primary appeal of sentiment analysis lies in its potential to provide unique insights into public opinion on a large scale. This can be extremely valuable for industries like marketing, public relations, and social media monitoring, where understanding customer sentiments can guide strategy and decision-making. However, while sentiment analysis is a popular project for aspiring data scientists, its value in a job application portfolio may not be as high as one might hope.
Let’s delve into the reasons why sentiment analysis might not be the most attractive project in your portfolio from an employer’s perspective, and what kinds of projects might make a stronger impression.
What Skills Are Employers Looking For?
To answer this question, I analyzed a data set of 2,003 job postings and 1,523 blog articles. I used ChatGPT to extract skills then compared the top skills from each set. The full analysis is available here. Here are the top skills demanded by employers for data science jobs in June 2023, overlaid with how those same skills rank in the blog article data set.
Apparently bloggers like me should be writing less about NLP and more about statistical analysis. 🤔
Now let’s look at the reverse, top skills mentioned in blog articles overlaid with their rankings from job postings. This time I’ll show the top 40.
Here we can see that sentiment analysis is the 23rd most popular skill mentioned in blog articles, but has a ranking in the job postings data set that’s lower by a whopping 1035! This means that it’s very rare for employers to ask for sentiment analysis skills but very common for people to write about their sentiment analysis projects.
Quick Aside On Large Language Model (LLM) Projects
Notice that what’s true for sentiment analysis is also true for all generative AI skills (such as chatgpt, generative ai, langchain, etc.) This was a huge surprise from me, and I wrote my thoughts about where all the missing generative AI jobs are here. But keep this in mind when choosing portfolio projects: LLM portfolio projects may not showcase the skills employers actually demand.
Best NLP Portfolio Projects
The main insight to me from this ranking of top skills mentioned in job postings is that employers want strong fundamentals. Things like coding, SQL, data viz, cloud tools (aws, azure, gcp), and predictive modeling are at the top of the list. These skills are the bread and butter of data science, and the recent generative AI revolution hasn’t (yet) disrupted that.
While some advocate for portfolio projects like text summarization and conversational bots, I instead would recommend a conventional text classification project that focuses on the end-to-end process. As a hiring manager, I’d rather see that you deployed a simple model to the cloud and can tell a good story with your data.
Pros and Cons of Sentiment Analysis Portfolio Projects
- Sentiment analysis is essentially a multiclass classification problem with 3 classes (positive, negative, neutral). This is a very core task in data science, especially in NLP. This is probably why sentiment analysis has become such a popular portfolio project.
- Training data is readily available.
- As shown above, employers don’t generally ask for this skill in job postings.
- Even if a company needed you to solve a sentiment analysis task on the job, it’s unlikely you would do anything other than apply one of the 100s of available pre-trained models on Huggingface.
- You won’t stand out from other candidates. Every one has a Twitter sentiment analysis project on their resume.
- Twitter and Reddit data are not so easy to come by after their recent API pricing debacles. This makes it harder to apply your toy model on real data.