Newby Coder header banner

Python Matplotlib

Data Visualization

Data Visualization Techniques

Some data visualization techniques are:

Matplotlib

Installing Matplotlib

Following command can be used for installing matplotlib :

pip install matplotlib 

Installing Pandas

Pandas enables to carry out data analysis workflow in Python without having to switch to a more domain specific language like R

pip install pandas 

Creating different visualizations

Scatter Plot

Pyplot provides state-machine interface to the plotting library of matplotlib

This means that figures and axes are implicitly and automatically created to achieve the desired plot

Following examples reads a csv which contains the average temperatures for the month of June of a country over years and displays a scatter plot

# import matplotlib with pt as its alias
import matplotlib.pyplot as pt
# import pandas with pd as alias
import pandas as pd

# read_csv() method of pandas is used to read from dataset or csv file
data = pd.read_csv("data.csv")

# head() method is used to select some elements from dataset
data=data.head(100)

# scatter() method of pyplot is used to plot the scatter
# here, first argument year is plotted in x-axis and avg_temp in y-axis
# scatter colour is 'red' and is labeled as 'scatter'
pt.scatter(data["year"], data["avg_temp"], color="red", label="scatter")

# xlabel() assigns the label of x-axis
pt.xlabel("Year", color="green")
# ylabel() assigns label of y-axis
pt.ylabel("Avg Temperature for month of June", color="red")

# title() is used to assign title for the plot
pt.title("Average June temperature for years", color="green")

# displays the plot
pt.show()

Output

python-matplotlib-scatter

To include line graph along-with, following two lines(highlighted) are added to above code

# import matplotlib with pt as its alias
import matplotlib.pyplot as pt
# import pandas with pd as alias
import pandas as pd

# read_csv() method of pandas is used to read from dataset or csv file
data = pd.read_csv("data.csv")

# head() method is used to select some elements from dataset
data=data.head(100)

# scatter() method of pyplot is used to plot the scatter
# here, first argument year is plotted in x-axis and avg_temp in y-axis
# scatter colour is 'red' and is labeled as 'scatter'
pt.scatter(data["year"], data["avg_temp"], color="red", label="scatter")

# xlabel() assigns the label of x-axis
pt.xlabel("Year", color="green")
# ylabel() assigns label of y-axis
pt.ylabel("Avg Temperature for month of June", color="red")

# title() is used to assign title for the plot
pt.title("Average June temperature for years", color="green")

# plot() is used to create line graph
pt.plot(data["year"], data["avg_temp"], color="blue", label="line graph")

# Calling legend() with no arguments automatically fetches the legend handles and their associated labels
pt.legend()

# displays the plot
pt.show()

Output

python-matplotlib-scatter-line

Bar Graph

Code for Bar graph is similar as scatter plot

bar() method of pyplot is used for plotting bar graph

import matplotlib.pyplot as pt
import pandas as pd

data = pd.read_csv("data.csv")
data= data.head(30)


# bar() method is used to plot a bar graph
# Here, a list of colors is taken for showing graph
pt.bar(data["year"], data["avg_temp"], color=["green", 'blue', "red"])

pt.xlabel("Year", color="green")
pt.ylabel("Average Temperature for month of June", color="blue")
pt.title("Avg June Temperature for years", color="green")
pt.show() 

Output

python-matplotlib-bargraph

Pie Charts

Pie charts can be drawn using the function pie() in the pyplot module

Following example shows temperature distribution of countries for a month, in groups of less than 15, 15-25, 25-30 and more than 30

import matplotlib.pyplot as pt
import pandas as pd

data = pd.read_csv("data2.csv")


x=len(data[data.avg_temp>=30])  # countries with avgerage temperature of more than 25
x1=len(data[(data.avg_temp>=25) & (data.avg_temp<30)])  # countries with average temperature between 25 to 30
x2=len(data[(data.avg_temp>=15) & (data.avg_temp<25)])  # countries with average temperature between 15 to 25
x3=len(data[data.avg_temp<15])    #countries with average temperature less than 15

pt.axis('equal')  #for making pie chart circular,that makes major axis and minor axis equal

#Here we need a list of values that are simply x,x1 and x2
#colors specify a list of colors in pie chart
#In order to specify labels we use labels attribute

pt.pie([x,x1,x2, x3],colors=['red', 'yellow', 'green', 'blue'],labels=['>30', '25-30', '15-25', '<15'])

pt.legend(title='Average Temperature of a month for countries')  # to shown the labels as legends

pt.show() 

output

python-matplotlib-piechart

Histogram

histogram is a kind of bar graph, which is an estimate of the probability distribution of a continuous variable (quantitative variable) and was first introduced by Karl Pearson

Following example shows average temperature distribution for countries for month of June 2013

import matplotlib.pyplot as pt
import pandas as pd

avg_temp = pd.read_csv("data2.csv")["avg_temp"]
bins = range(-4, 37)

# hist() is used to draw histogram
# histogram type is 'bar' and row width is 0.8 to allow gaps between consecutive bars
pt.hist(avg_temp, bins, histtype='bar', rwidth=0.8)
pt.title('Temperature distribution')
pt.xlabel('Average Temperature of June 2013')
pt.ylabel('Countries')
pt.show()

Output

cl-python-matplotlib-histogram