Tools: Jupiter Notebook / Python / Pandas
Objectives
1. Identify the most popular programming languages among IT professionals.
2. Analyze average salaries and income statistics for IT professionals.
3. Explore the age distribution among IT professionals.
4. Provide statistical information on working hours for part-time and full-time IT professionals.
5. Determine the most popular databases among IT professionals.
6. Examine the relationship between income and various factors such as working hours, age, education, and other variables.
Tools:
Python, Jupyter notebook
Libraries
numpy , pandas , seaborn , matplotlib.pyplot
Dataset:
Our data set is a survey works among IT professional , collected and published on Github in the link below.
It has 11551 records and 84 columns. (11552, 85)
1. Identify the most popular programming languages among IT professionals.
There are unique 28 programming languages , the top most popular ones are:
- JavaScript
- HTML/CSS
- SQL
- Bash/Shell/PowerShell
- Python

2. Analyze average salaries and income statistics for IT professionals.
3. Explore the age distribution among IT professionals.
Summary statistics.
- Mean: 30.77
- Std (standard deviation): 7.37
- Min: 16.00
- 25% (first quartile, Q1): 25.00
- 50% (median, Q2): 29.00
- Max: 72.00
- 75% (third quartile, Q3): 35.00

4. Provide statistical information on working hours for part-time and full-time IT professionals.

Reading into dataset:
The Max of 1,012.00 hours is mathematically impossible (there are only 168 hours in a week). This indicates “noisy” data or data entry errors in the dataset.
Similar to the full-time data, the Max of 375.00 hours is impossible for a single week, suggesting errors in the source data.
Full-time workers are a much larger group in this data and center strictly around a 40-hour week. Part-time workers have a much broader distribution relative to their average, typically working between 20 and 35 hours.
After applying outlier treatment:


5. Determine the most popular databases among IT professionals.
There are 13 unique databases , the top most popular ones are:
MySQL
Microsoft SQL Server
PostgreSQL
SQLite
MongoDB

6. Examine the relationship between income and various factors such as working hours, age, education, and other variables.
Top 3 Insights From the Pair Plot Below:
- Experience Variables Are Strongly Correlated
- YearsCode, YearsCodePro, and Age move together in clear linear patterns.
- This confirms that experience‑related fields are consistent and reinforce each other.
- Useful for feature engineering: you may want to avoid using all three in the same model due to multicollinearity.
- Compensation Shows Only a Weak Positive Trend With Experience
- CompTotal increases slightly with YearsCodePro, but the scatter is wide.
- This suggests compensation is influenced by many external factors (role, region, company size), not just experience.
- Insight: experience alone is not a strong predictor of pay.
- Tool Knowledge Grows Slowly With Experience
- Languages and Databases show mild upward trends with experience.
- Most respondents know only a few tools, with a small number of outliers.
- This indicates that tool count is not a strong differentiator across the population.



