Some data visualization techniques are:
Following command can be used for installing matplotlib :
pip install matplotlib
Pandas enables to carry out data analysis workflow in Python without having to switch to a more domain specific language like R
pip install pandas
Pyplot provides state-machine interface to the plotting library of matplotlib
This means that figures and axes are implicitly and automatically created to achieve the desired plot
Following examples reads a csv which contains the average temperatures for the month of June of a country over years and displays a scatter plot
# import matplotlib with pt as its alias
import matplotlib.pyplot as pt
# import pandas with pd as alias
import pandas as pd
# read_csv() method of pandas is used to read from dataset or csv file
data = pd.read_csv("data.csv")
# head() method is used to select some elements from dataset
data=data.head(100)
# scatter() method of pyplot is used to plot the scatter
# here, first argument year is plotted in x-axis and avg_temp in y-axis
# scatter colour is 'red' and is labeled as 'scatter'
pt.scatter(data["year"], data["avg_temp"], color="red", label="scatter")
# xlabel() assigns the label of x-axis
pt.xlabel("Year", color="green")
# ylabel() assigns label of y-axis
pt.ylabel("Avg Temperature for month of June", color="red")
# title() is used to assign title for the plot
pt.title("Average June temperature for years", color="green")
# displays the plot
pt.show()
Output
To include line graph along-with, following two lines(highlighted) are added to above code
# import matplotlib with pt as its alias
import matplotlib.pyplot as pt
# import pandas with pd as alias
import pandas as pd
# read_csv() method of pandas is used to read from dataset or csv file
data = pd.read_csv("data.csv")
# head() method is used to select some elements from dataset
data=data.head(100)
# scatter() method of pyplot is used to plot the scatter
# here, first argument year is plotted in x-axis and avg_temp in y-axis
# scatter colour is 'red' and is labeled as 'scatter'
pt.scatter(data["year"], data["avg_temp"], color="red", label="scatter")
# xlabel() assigns the label of x-axis
pt.xlabel("Year", color="green")
# ylabel() assigns label of y-axis
pt.ylabel("Avg Temperature for month of June", color="red")
# title() is used to assign title for the plot
pt.title("Average June temperature for years", color="green")
# plot() is used to create line graph
pt.plot(data["year"], data["avg_temp"], color="blue", label="line graph")
# Calling legend() with no arguments automatically fetches the legend handles and their associated labels
pt.legend()
# displays the plot
pt.show()
Output
Code for Bar graph is similar as scatter plot
bar()
method of pyplot is used for plotting bar graph
import matplotlib.pyplot as pt
import pandas as pd
data = pd.read_csv("data.csv")
data= data.head(30)
# bar() method is used to plot a bar graph
# Here, a list of colors is taken for showing graph
pt.bar(data["year"], data["avg_temp"], color=["green", 'blue', "red"])
pt.xlabel("Year", color="green")
pt.ylabel("Average Temperature for month of June", color="blue")
pt.title("Avg June Temperature for years", color="green")
pt.show()
Output
Pie charts can be drawn using the function pie() in the pyplot module
Following example shows temperature distribution of countries for a month, in groups of less than 15, 15-25, 25-30 and more than 30
import matplotlib.pyplot as pt
import pandas as pd
data = pd.read_csv("data2.csv")
x=len(data[data.avg_temp>=30]) # countries with avgerage temperature of more than 25
x1=len(data[(data.avg_temp>=25) & (data.avg_temp<30)]) # countries with average temperature between 25 to 30
x2=len(data[(data.avg_temp>=15) & (data.avg_temp<25)]) # countries with average temperature between 15 to 25
x3=len(data[data.avg_temp<15]) #countries with average temperature less than 15
pt.axis('equal') #for making pie chart circular,that makes major axis and minor axis equal
#Here we need a list of values that are simply x,x1 and x2
#colors specify a list of colors in pie chart
#In order to specify labels we use labels attribute
pt.pie([x,x1,x2, x3],colors=['red', 'yellow', 'green', 'blue'],labels=['>30', '25-30', '15-25', '<15'])
pt.legend(title='Average Temperature of a month for countries') # to shown the labels as legends
pt.show()
output
histogram is a kind of bar graph, which is an estimate of the probability distribution of a continuous variable (quantitative variable) and was first introduced by Karl Pearson
Following example shows average temperature distribution for countries for month of June 2013
import matplotlib.pyplot as pt
import pandas as pd
avg_temp = pd.read_csv("data2.csv")["avg_temp"]
bins = range(-4, 37)
# hist() is used to draw histogram
# histogram type is 'bar' and row width is 0.8 to allow gaps between consecutive bars
pt.hist(avg_temp, bins, histtype='bar', rwidth=0.8)
pt.title('Temperature distribution')
pt.xlabel('Average Temperature of June 2013')
pt.ylabel('Countries')
pt.show()
Output