Table of Contents
ToggleHi people, before we jump to implement QQ plots in Python. Let’s quickly understand what a QQ plot is and what QQ plot assumptions are.
What is a QQ plot?
A QQ plot is a graphical method to compare two probability distributions by plotting their quantiles against each other.
How does the QQ plot work?
The QQ plot pairs up the quantiles from two distributions and determines whether the plot lies close to a straight line which means distributions are similar. If the points differ from the straight line then it indicates a difference in the distributions.
Why do we require a QQ plot?
Used to check if the distribution follows a normal distribution which is important for applying statistical tests.
QQ Plot Assumptions
When interpreting a QQ plot, there are a few key assumptions to keep in mind:
- Theoretical Distribution: The QQ plot compares the data to theoretical distributions such as normal, uniform, etc. For example, if we compare data to a normal distribution, the QQ plot will show how the data’s quantiles match the expected quantiles of a normal distribution
- Sample Size: A large sample size improves the accuracy of the QQ plot compared to small datasets. Small datasets might have misleading interpretations.
- Shape of distribution: A QQ plot primarily checks the shape of the distribution like normality, skewness, kurtosis, etc rather than specific parameters such as mean and variance. In simple terms, the plot does not directly test the hypothesis, it only suggests the likelihood of normality or other distributions.
Creating a QQ Plot in Python
We have multiple libraries like Statsmodels, Scipy, etc in Python which would help us to create a QQ plot. Let us explore with an example for each library
Creating a QQ plot in Python Using Scipy
import numpy as np import scipy.stats as stats import matplotlib.pyplot as plt # Generate random data data = np.random.normal(loc=0, scale=1, size=1000) # Create QQ plot stats.probplot(data, dist="norm", plot=plt) plt.title('QQ Plot with Scipy') plt.show()Output

The function stats.probplot function from Scipy is used to generate a QQ plot. The dist=”norm” parameter specifies that we want to compare the data to normal distribution. The plot is then displayed using matplotlib library.
Creating a QQ plot in Python Using Statsmodels
import numpy as np import statsmodels.api as sm import matplotlib.pyplot as plt # Generate random data data = np.random.normal(loc=0, scale=1, size=1000) # Create QQ plot sm.qqplot(data, line ='45') plt.title('QQ Plot with Statsmodels') plt.show()Output

Statsmodel provides us with ‘qqplot’ function to generate QQ plot. The ‘line=45’ parameter adds a reference line at 45 degrees helping us to visually assess how well the data fits the normal distribution.
We can use seaborn to create aesthetically pleasing plots. Let me show you an example.import seaborn as sns import numpy as np import scipy.stats as stats import matplotlib.pyplot as plt # Generate random data data = np.random.normal(loc=0, scale=1, size=1000) # Create QQ plot stats.probplot(data, dist="norm", plot=plt) sns.set(style="darkgrid") plt.title('QQ Plot with Seaborn') plt.show()Output

QQ Plot Between Two Variables
QQ plot is generally used to compare a dataset to a dataset with a theoretical distribution but they can also be used to compare two datasets. In this case, the QQ plot will display how the quantities of the two datasets align with each other.
import numpy as np import statsmodels.api as sm import matplotlib.pyplot as plt # Generate random data for two variables x = np.random.normal(0, 1, 100) # Variable 1 y = np.random.normal(0, 1.5, 100) # Variable 2 # Create a Q-Q plot comparing x and y plt.figure(figsize=(8, 8)) sm.qqplot_2samples(x, y, xlabel="Quantiles of X", ylabel="Quantiles of Y", line='45') plt.title("Q-Q Plot Comparing Two Variables") plt.grid() plt.show()Output

This compares the quantiles of data1 against data2. If the points form a straight line, the two datasets follow similar distributions.
The qqplot_2samples function from statsmodels.api compares the quantiles of the two datasets.
QQ Plot for Multiple Variables
In some cases, you may want to compare multiple variables to see how they compare with a theoretical distribution or with each other. This can be done by plotting multiple QQ plots in a grid or side by side.
import seaborn as sns import matplotlib.pyplot as plt # Generate random datasets data1 = np.random.normal(loc=0, scale=1, size=1000) data2 = np.random.normal(loc=0, scale=1.5, size=1000) data3 = np.random.normal(loc=1, scale=1, size=1000) # Set up the figure plt.figure(figsize=(8, 6)) # Plot the QQ plot for each data set stats.probplot(data1, dist="norm", plot=plt) stats.probplot(data2, dist="norm", plot=plt) stats.probplot(data3, dist="norm", plot=plt) # Customize the plot to make it clearer plt.title('QQ Plots for Multiple Variables') plt.legend(['Data 1 (Normal)', 'Data 2 (Normal)', 'Data 3 (Uniform)']) plt.grid(True) plt.show()Output

This code generates three QQ plots for data1, data2, and data3, each compared with a normal distribution.
What Does a QQ Plot Show?
A QQ plot provides a visual method for assessing whether a dataset follows a specific distribution. It compares the observed data to the theoretical quantiles. The following patterns can be observed:
- Straight Line: If the points lie along a straight line, it suggests that the data follows the specified distribution.
- Curvature: If the points deviate in a curved pattern (e.g., S-shaped), it suggests that the data might be skewed or have heavier tails than the expected distribution.
- Outliers: Points that deviate significantly from the line indicate potential outliers or extreme values in the data.
QQ Plot Interpretation
- Normal Distribution: If we have a QQ plot with a straight line then the data is likely normally distributed.
- Right-Skewed Distribution: If we see points bending upwards on the right side then it indicates a right-skewed distribution where the data has a long tail on the right side.
- Left-Skewed Distribution: It shows points bending downward on the left side indicating left-skewed distribution.
- Heavy Tails: If we have points deviate away from the straight line at the extremes then data has heavy tails, meaning it may contain more extreme values than a normal distribution.
QQ Plot of Residuals
When we do regression analysis QQ plots can be used to assess the residuals of a model. If the residuals follow a normal distribution then the model is appropriate. Deviation from normality in the residuals might suggest heteroscedasticity or model misspecification.
import statsmodels.api as sm import numpy as np # Generate synthetic data and fit a linear regression model X = np.random.normal(0, 1, 100) y = 3*X + np.random.normal(0, 1, 100) # Fit linear regression model X = sm.add_constant(X) # Adds constant (intercept) term model = sm.OLS(y, X).fit() # Create QQ plot of residuals sm.qqplot(model.resid, line ='45') plt.show()Output

How to Tell if a QQ Plot is Normally Distributed
Conclusion
QQ plot in Python can be implemented using libraries like Scipy, Statsmodels, etc. Interpreting the plot helps us to make informed decisions about the distribution of our data.
You can read about our other blogs on XGB feature importance and how to implement Jaccard similarity in Python.