Introduction
Welcome, fellow Python enthusiasts! If you’re part of the vibrant community of 18-30-year-olds looking to become Python pros, you’re in the right place. Today, we’re diving deep into the world of machine learning with Python 3, focusing on a fundamental technique: Multiple Linear Regression. By the end of this blog post, you’ll have a solid understanding of how to use this powerful tool to make accurate predictions and solve real-world problems.
Chapter 1: Understanding Linear Regression
Before we jump into multiple linear regression, let’s ensure we’re on the same page regarding its simpler cousin, simple linear regression.
What is Simple Linear Regression?
Imagine you want to predict a student’s final exam score based on the number of hours they studied. Simple linear regression helps you establish a linear relationship between two variables: the independent variable (hours studied) and the dependent variable (exam score). In Python, you can perform this task easily using libraries like NumPy and Matplotlib.
import numpy as np
import matplotlib.pyplot as plt
# Sample data
hours_studied = np.array([1, 2, 3, 4, 5])
exam_scores = np.array([40, 50, 60, 70, 80])
# Fit a linear regression model
coefficients = np.polyfit(hours_studied, exam_scores, 1)
slope, intercept = coefficients
# Predict exam score for 6 hours of study
predicted_score = slope * 6 + intercept
# Plot the data and regression line
plt.scatter(hours_studied, exam_scores, label='Actual Scores')
plt.plot(hours_studied, slope * hours_studied + intercept, color='red', label='Regression Line')
plt.xlabel('Hours Studied')
plt.ylabel('Exam Score')
plt.legend()
plt.show()
print(f"Predicted score for 6 hours of study: {predicted_score:.2f}")
In this code snippet, we fit a simple linear regression model to predict exam scores based on hours studied. The regression line helps us make predictions, like estimating a score for 6 hours of study.
Chapter 2: Advancing to Multiple Linear Regression
Now that you’ve grasped the basics of linear regression let’s step up our game to multiple linear regression. In multiple linear regression, we work with multiple independent variables to predict a dependent variable. It’s like adding more dimensions to our analysis, making it a powerful tool for various real-world scenarios.
Understanding Multiple Linear Regression
Suppose you want to predict a house’s price, considering not just the size but also the number of bedrooms and the neighborhood’s crime rate. This is where multiple linear regression shines.
Here’s a Python example of multiple linear regression using the popular library, scikit-learn:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
# Sample data
data = pd.DataFrame({
'Size': [1200, 1500, 1700, 900, 1100],
'Bedrooms': [3, 3, 2, 2, 1],
'Crime_Rate': [0.05, 0.02, 0.07, 0.01, 0.04],
'Price': [220000, 300000, 280000, 150000, 180000]
})
# Split data into features (X) and target (y)
X = data[['Size', 'Bedrooms', 'Crime_Rate']]
y = data['Price']
# Create a linear regression model
model = LinearRegression()
model.fit(X, y)
# Predict the price of a new house
new_house = np.array([[1400, 2, 0.03]])
predicted_price = model.predict(new_house)
print(f"Predicted price of the new house: ${predicted_price[0]:,.2f}")
Predicted price of the new house: $251,493.01
In this code, we use the scikit-learn library to perform multiple linear regression. We predict the price of a house based on its size, number of bedrooms, and crime rate in the neighborhood. This example illustrates how multiple linear regression considers multiple factors when making predictions.
Chapter 3: Hands-On Multiple Linear Regression in Python
Now that you have a solid understanding of multiple linear regression, let’s get hands-on with Python. We’ll create a complete Python script to predict car prices based on various features.
Practical Example: Predicting Car Prices
For this example, we’ll use a dataset containing car information like horsepower, fuel efficiency, and engine size. Our goal is to predict a car’s price based on these features.
To follow along, make sure you have the necessary libraries installed, such as Pandas, NumPy, and scikit-learn.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Load the dataset
url = 'https://example.com/car_data.csv'
data = pd.read_csv(url)
# Split data into features (X) and target (y)
X = data[['Horsepower', 'Fuel_Efficiency', 'Engine_Size']]
y = data['Price']
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")
print(f"R-squared: {r2:.2f}")
In this code, we load a car dataset, split it into training and testing sets, create a linear regression model, and evaluate its performance. This practical example demonstrates the power of multiple linear regression in Python for real-world applications.
More Examples
Certainly! Let’s explore a couple more examples of multiple linear regression with sample datasets and accompanying graphs.
Example 1: Predicting Home Prices
In this example, we’ll predict home prices based on the number of bedrooms, the square footage of the living area, and the age of the house. We’ll also create visualizations to better understand the relationships.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Generate synthetic data
np.random.seed(0)
n_samples = 100
bedrooms = np.random.randint(1, 6, size=n_samples)
sqft = np.random.randint(1000, 3000, size=n_samples)
age = np.random.randint(1, 21, size=n_samples)
price = 50000 + 10000 * bedrooms + 200 * sqft - 1000 * age + np.random.normal(0, 5000, size=n_samples)
# Create a DataFrame
data = pd.DataFrame({'Bedrooms': bedrooms, 'Sqft': sqft, 'Age': age, 'Price': price})
# Visualize the data
fig = plt.figure(figsize=(12, 6))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(data['Bedrooms'], data['Sqft'], data['Age'], c=data['Price'], cmap='viridis')
ax.set_xlabel('Bedrooms')
ax.set_ylabel('Sqft')
ax.set_zlabel('Age')
ax.set_title('Home Prices')
plt.show()
# Split data into features (X) and target (y)
X = data[['Bedrooms', 'Sqft', 'Age']]
y = data['Price']
# Create and train a linear regression model
model = LinearRegression()
model.fit(X, y)
# Predict prices
predicted_prices = model.predict(X)
# Calculate and print the Mean Squared Error (MSE)
mse = mean_squared_error(y, predicted_prices)
print(f"Mean Squared Error: {mse:.2f}")
Mean Squared Error: 24400599.32
In this example, we generate synthetic data and visualize it using a 3D scatter plot. Then, we create a multiple linear regression model to predict home prices based on the number of bedrooms, square footage, and age. The MSE measures how well the model fits the data.
Example 2: Predicting Sales Revenue
In this example, we’ll predict sales revenue based on advertising spending on TV, radio, and newspaper. We’ll also create visualizations to understand the impact of each advertising channel. Download dataset from here and save it in the same directory as your project’s directory.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
# Load the advertising dataset
url = 'Advertising.csv'
data = pd.read_csv(url)
# Visualize the relationship between advertising channels and sales
fig, axs = plt.subplots(1, 3, figsize=(15, 5))
axs[0].scatter(data['TV'], data['Sales'])
axs[0].set_title('TV vs. Sales')
axs[0].set_xlabel('TV Advertising Spending')
axs[0].set_ylabel('Sales')
axs[1].scatter(data['Radio'], data['Sales'])
axs[1].set_title('Radio vs. Sales')
axs[1].set_xlabel('Radio Advertising Spending')
axs[1].set_ylabel('Sales')
axs[2].scatter(data['Newspaper'], data['Sales'])
axs[2].set_title('Newspaper vs. Sales')
axs[2].set_xlabel('Newspaper Advertising Spending')
axs[2].set_ylabel('Sales')
plt.tight_layout()
plt.show()
# Split data into features (X) and target (y)
X = data[['TV', 'Radio', 'Newspaper']]
y = data['Sales']
# Create and train a linear regression model
model = LinearRegression()
model.fit(X, y)
# Predict sales
predicted_sales = model.predict(X)
# Calculate and print the R-squared score
r2 = r2_score(y, predicted_sales)
print(f"R-squared: {r2:.2f}")
R-squared: 0.90
In this example, we load the advertising dataset and create scatter plots to visualize the relationship between advertising spending on TV, radio, newspaper, and sales. We then build a multiple linear regression model to predict sales based on advertising expenditures and calculate the R-squared score to assess the model’s performance.
These additional examples should provide you with a deeper understanding of multiple linear regression in Python and how it can be applied to different real-world scenarios. Feel free to experiment with your own datasets and explore various aspects of multiple linear regression.
Conclusion
Congratulations! You’ve embarked on a journey from simple linear regression to mastering multiple linear regression in Python 3. You’ve learned the foundations of linear regression, understood how to apply it to multiple variables, and even tackled a practical example.
As you continue your Python journey, keep exploring, experimenting, and building your skills. Machine learning is a vast field, and multiple linear regression is just the beginning. Stay curious, and soon you’ll become a Python pro, capable of tackling complex data science challenges.
Remember, the key to mastering Python and machine learning is practice and continuous learning. So, go ahead, dive into your own datasets, and apply what you’ve learned here to solve real-world problems. Happy coding!
Now, it’s your turn. Try implementing multiple linear regression in Python with your own datasets and see the magic unfold.
Also, check out our other playlist Rasa Chatbot, Internet of things, Docker, Python Programming, MQTT, Tech News, ESP-IDF etc.
Become a member of our social family on youtube here.
Stay tuned and Happy Learning. ✌🏻😃 ❤️🔥