LinkedIn Job Postings ML Pipeline
Full ML pipeline on 123,849 LinkedIn postings (2023–2024). Salary prediction, skills demand analysis (213K pairs), NLP on descriptions. 7 CSV files joined. Pay-period normalization (hourly→yearly).
123,849 LinkedIn job postings, 7 relational CSV files
7-file join → NLP feature extraction → salary regression + skills demand analysis
End-to-end ML pipeline on a large LinkedIn dataset with rich relational structure.
Dataset (7 files joined)
| File | Rows | Info |
|---|---|---|
| postings.csv | 123,849 | Title, company, description, location |
| companies.csv | 24,473 | Size, industry, followers |
| salaries.csv | 40,785 | Ranges (32.9% posting coverage) |
| job_skills.csv | 213,768 | Skill→job mappings |
Salary Coverage — Pay Period Normalization
- ▸Yearly: 23K (direct)
- ▸Hourly: 16K (× 2,080 → yearly)
- ▸Monthly: 539 (× 12)
- ▸Weekly: 180 (× 52)
Task 1: Salary Prediction (Regression) Features: pay-period normalization, TF-IDF on descriptions, company size, seniority from title. Key predictors: job title, company size, location, required skills, seniority.
Task 2: Skills Demand Analysis 213,768 skill-job pairs → frequency + TF-IDF weighting. Top in-demand: Python, SQL, Communication, Project Management, Machine Learning. Fast-growing 2023–2024: LLMs, Prompt Engineering, Vector Databases.
Task 3: Market Insights
- ▸85%+ postings concentrated in US/Europe top cities
- ▸Data Science premium: 3–4× vs Operations base salary
- ▸Remote premium: +$12K average for fully remote roles
Key Caveat Only 32.9% of postings have salary data — selection bias makes model non-representative of full market.