Image Processing Object Detection with Raspberry Pi and Python

dbscan_main_blog.png

“As an Amazon Associates Program member, clicking on links may result in Maker Portal receiving a small commission that helps support future projects.”

 

This is the second entry into the Raspberry Pi and Python image processing tutorial series. In part I, the Raspberry Pi’s picamera was introduced along with its respective Python toolbox. Simple image manipulation and color recognition were explored using the picamera and Python’s numerical toolbox (Numpy). Click here to explore Part I. ** in this tutorial, I migrated to the Rapsberry Pi 3B+ for more processing power to decrease computation time. Originally, I started with the Raspberry Pi Zero W, but computation requirements were slightly below what I needed.

In this entry, image processing-specific Python toolboxes are explored and applied to object detection to create algorithms that identify multiple objects and approximate their location in the frame using the picamera and Raspberry Pi. The methods used in this tutorial cover edge detection algorithms as well as some simple machine learning algorithms that allow us to identify individual objects in a frame.


Multidimensional Image Processing Toolbox - scipy.ndimage

Python’s ‘SciPy’ toolbox will be used for edge detection in images, which will help us determine boundaries of multiple objects present in a specific image. In the Raspberry Pi terminal, SciPy can be downloaded using the following method:

pi@raspberrypi:~ $ sudo apt-get install python3-scipy

The multidimensional imaging toolbox, called ‘ndimage’ comes with SciPy and allows users to manipulate images using a wide array of algorithms. The full list of functions can be found on the ndimage reference guide here. The ndimage toolbox is incredibly powerful and efficient when dealing with image manipulation, specifically with regard to edge detection and spectral methods.

As an example of ndimage functionality, several of the most common methods for image manipulation can be tested to detect edges that define objects. We can therefore, use the ndimage toolbox to test these types of edge detection algorithms and see which best suits our needs. Below is the example code for four common edge detection algorithms, where each uses different derivative functions and convolution of different forms. The four methods are:

  1. Gaussian Gradient

  2. Laplacian of Gaussian

  3. Canny Method

  4. Sobel Method

We can see the weaknesses and strengths of each below in the four-panel plot.

import time
from picamera import PiCamera
import scipy.ndimage as scimg
import numpy as np
import matplotlib.pyplot as plt

# picamera setup
h = 640 #largest resolution length
cam_res = (int(h),int(0.75*h)) # resizing to picamera's required ratios
cam_res = (int(32*np.floor(cam_res[0]/32)),int(16*np.floor(cam_res[1]/16)))
cam = PiCamera(resolution=cam_res)
# preallocating image variables
data = np.empty((cam_res[1],cam_res[0],3),dtype=np.uint8)
x,y = np.meshgrid(np.arange(cam_res[0]),np.arange(cam_res[1]))

# different edge detection methods
cam.capture(data,'rgb') # capture image
# diff of gaussians
t0 = time.time()
grad_xy = scimg.gaussian_gradient_magnitude(data[:,:,0],sigma=1.5)
##grad_xy = np.mean(grad_xy,2)
t_grad_xy = time.time()-t0
# laplacian of gaussian
t0 = time.time()
lap = scimg.gaussian_laplace(data[:,:,0],sigma=0.7)
t_lap = time.time()-t0
# Canny method without angle
t0 = time.time()
gaus = scimg.fourier_gaussian(data[:,:,0],sigma=0.05)
can_x = scimg.prewitt(gaus,axis=0)
can_y = scimg.prewitt(gaus,axis=1)
can = np.hypot(can_x,can_y)
##can = np.mean(can,2)
t_can = time.time()-t0
# Sobel method
t0 = time.time()
sob_x = scimg.sobel(data[:,:,0],axis=0)
sob_y = scimg.sobel(data[:,:,0],axis=1)
sob = np.hypot(sob_x,sob_y)
##sob = np.mean(sob,2)
t_sob = time.time()-t0

# plotting routines and labeling
fig,ax = plt.subplots(2,2,figsize=(12,6))
ax[0,0].pcolormesh(x,y,grad_xy,cmap='gray')
ax[0,0].set_title(r'Gaussian Gradient [$\sigma = 1.5$] (Computation Time: {0:2.2f}s)'.format(t_grad_xy))
ax[0,1].pcolormesh(x,y,lap,cmap='gray')
ax[0,1].set_title(r'Laplacian of Gaussian [$\sigma = 0.7$] (Computation Time: {0:2.2f}s)'.format(t_lap))
ax[1,0].pcolormesh(x,y,can,cmap='gray')
ax[1,0].set_title(r'Canny [$\sigma = 0.05$] (Computation Time: {0:2.2f}s)'.format(t_can))
ax[1,1].pcolormesh(x,y,sob,cmap='gray')
ax[1,1].set_title('Sobel (Computation Time: {0:2.2f}s)'.format(t_sob))
fig.tight_layout()
fig.savefig('edge_plots.png',dpi=150,facecolor=[252/255,252/255,252/255])
#analyzing histograms
fig2,ax2 = plt.subplots(2,2,figsize=(12,6))
ax2[0,0].hist(grad_xy.ravel(),bins=100)
ax2[0,1].hist(lap.ravel(),bins=100)
ax2[1,0].hist(can.ravel(),bins=100)
ax2[1,1].hist(sob.ravel(),bins=100)

# pulling out object edges
fig3,ax3 = plt.subplots(3,1,figsize=(12,6))
ax3[0].pcolormesh(x,y,can)
bin_size = 100
hist_vec = ax3[1].hist(can.ravel(),bins=bin_size)
hist_x,hist_y = hist_vec[0],hist_vec[1]
for ii in range(np.argmax(hist_x),bin_size):
    hist_max = hist_y[ii]
    if hist_x[ii]<0.01*np.max(hist_x):
        break
    
ax3[2].plot(x[can>hist_max],y[can>hist_max],marker='.',linestyle='')

plt.show()
edge_plots (1).png

The original image is also shown below, for reference.

raw_image.png

Original Image Analyzed Above

We can see that each method does fairly well when pulling out the edges of each object, with some being more precise, some noisier, and some missing some edges. As with many algorithms, the most accurate for edge detection of the four is the Canny method, which also happens to require the most computational time. However, it is so accurate that I will be using it as the method for detecting objects. The edge plots are also only using one of the colors in the image (red), so using all three (RGB) will also help lower the error when detecting objects.


Image Histograms and Their Connection to Object Detection

The histogram is useful in image processing, because there may be multiple artifacts that do not follow standard single-peak Gaussian distributions. We will see that the histogram distributions not only can be multi-peak, but also asymmetric. The plot below shows the distribution of the image above, with 100 bins

One method for approximating object edges is finding the point where the values are above the noise and at the point where enough points are available to delineate shapes. Above, I have delineated the 1% dropoff point, where we are approximately above the noise floor but also including enough points to recreate the object edges. The scatter points above the 1% dropoff are shown below:

The code to approximate the scatter points above is also included below.

import time
from picamera import PiCamera
import scipy.ndimage as scimg
import numpy as np
import matplotlib.pyplot as plt

plt.style.use('ggplot')

# picamera setup
h = 640 #largest resolution length
cam_res = (int(h),int(0.75*h)) # resizing to picamera's required ratios
cam_res = (int(32*np.floor(cam_res[0]/32)),int(16*np.floor(cam_res[1]/16)))
cam = PiCamera(resolution=cam_res)
# preallocating image variables
data = np.empty((cam_res[1],cam_res[0],3),dtype=np.uint8)
x,y = np.meshgrid(np.arange(cam_res[0]),np.arange(cam_res[1]))

# different edge detection methods
cam.capture(data,'rgb') # capture image

# Canny method without angle
t0 = time.time()
gaus = scimg.fourier_gaussian(data[:,:,0],sigma=0.01)
can_x = scimg.prewitt(gaus,axis=0)
can_y = scimg.prewitt(gaus,axis=1)
can = np.hypot(can_x,can_y)

t_can = time.time()-t0

# pulling out object edges
fig3,ax3 = plt.subplots(2,1,figsize=(10,7))
ax3[0].pcolormesh(x,y,can,cmap='gray')
bin_size = 100 # total bins to show
percent_cutoff = 0.01 # cutoff once main peak tapers to 1% of max
hist_vec = np.histogram(can.ravel(),bins=bin_size)
hist_x,hist_y = hist_vec[0],hist_vec[1]
for ii in range(np.argmax(hist_x),bin_size):
    hist_max = hist_y[ii]
    if hist_x[ii]<percent_cutoff*np.max(hist_x):
        break
    
# scatter points where objects exist
ax3[1].plot(x[can>hist_max],y[can>hist_max],marker='.',linestyle='',
            label='Scatter Above 1% Dropoff')
ax3[1].set_xlim(np.min(x),np.max(x))
ax3[1].set_ylim(np.min(y),np.max(y))
ax3[1].legend()
plt.show()

The code above (assuming the user is using the picamera and Python) goes through the following object detection routine:

  1. Take a snapshot using the picamera

  2. Calculate edges using the Canny method

  3. Establish the 1% dropoff region

  4. Include only points above the 1% dropoff region to approximate object regions

In the next section, we can begin to draw boundaries around the objects using clustering methods and hopefully delineate multiple objects and approximate their color.


Multiple Object Detection and Clustering with Scikit-Learn

Now that we have identified the rough boundaries of the objects in our images, we can investigate methods for clustering the points and identifying each individual object.

First, the ‘scikit-learn’ module in Python needs to be installed. It contains the machine learning algorithms that will be essential for clustering data points and approximating each object bounding box. Scikit-learn can be downloaded using the standard Raspberry Pi Python install method:

pi@raspberrypi:~ $ sudo pip3 install -U scikit-learn

Once the Scikit-learn module is installed, we can import it and investigate the DBSCAN method (Scikit-learn DBSCAN webpage). DBSCAN will sift through the points and separate ‘objects’ based on the proximity between points. An example of DBSCAN is shown below, which will return the groups of points associated with each object.

import time
from picamera import PiCamera
import scipy.ndimage as scimg
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
from matplotlib.patches import Rectangle
from sklearn.cluster import DBSCAN

# picamera setup
h = 640 #largest resolution length
cam_res = (int(h),int(0.75*h)) # resizing to picamera's required ratios
cam_res = (int(32*np.floor(cam_res[0]/32)),int(16*np.floor(cam_res[1]/16)))
cam = PiCamera(resolution=cam_res)
# preallocating image variables
data = np.empty((cam_res[1],cam_res[0],3),dtype=np.uint8)
x,y = np.meshgrid(np.arange(cam_res[0]),np.arange(cam_res[1]))

# different edge detection methods
cam.capture(data,'rgb') # capture image

# Canny method without angle
t1 = time.time()
gaus = scimg.fourier_gaussian(data[:,:,0],sigma=0.01)
can_x = scimg.prewitt(gaus,axis=0)
can_y = scimg.prewitt(gaus,axis=1)
can = np.hypot(can_x,can_y)

# pulling out object edges
fig3,ax3 = plt.subplots(2,1,figsize=(10,7))
ax3[0].pcolormesh(x,y,can,cmap='gray')
bin_size = 100 # total bins to show
percent_cutoff = 0.02 # cutoff once main peak tapers to 1% of max
hist_vec = np.histogram(can.ravel(),bins=bin_size)
hist_x,hist_y = hist_vec[0],hist_vec[1]
for ii in range(np.argmax(hist_x),bin_size):
    hist_max = hist_y[ii]
    if hist_x[ii]<percent_cutoff*np.max(hist_x):
        break

# sklearn section for clustering
x_cluster = x[can>hist_max]
y_cluster = y[can>hist_max]
scat_pts = []
for ii,jj in zip(x_cluster,y_cluster):
    scat_pts.append((ii,jj))
    
min_samps = 15
leaf_sz = 10
max_dxdy = 25
# clustering analysis for object detection
clustering = DBSCAN(eps=max_dxdy,min_samples=min_samps,
                    algorithm='kd_tree',
                    leaf_size=leaf_sz).fit(scat_pts)

color_txt = ['Red','Green','Blue']
fig4,ax4 = plt.subplots(1)
fig4.set_size_inches(9,7)
im_show = ax4.imshow(data,origin='lower')
# drawing boxes around individual objects
for ii in np.unique(clustering.labels_):
    if ii==-1:
        continue
    clus_dat = np.where(clustering.labels_==ii)

    x_pts = x_cluster[clus_dat]
    y_pts = y_cluster[clus_dat]
    ax3[1].plot(x_pts,y_pts,marker='.',linestyle='',label='Object {0:2.0f}'.format(ii))

ax3[1].legend()
fig3.savefig('dbscan_demo.png',dpi=150,facecolor=[252/255,252/255,252/255])
plt.show()

The result of the code above is shown below in the scatter plot which shows different colors delineated to each object:

dbscan_demo.png

The Scikit-learn DBSCAN method outputs the four objects as we expect. As one might imagine, this is an incredibly useful tool for machine learning and computer vision applications. Unfortunately, in the case above, the total computation time is quite high, about 4-5s on the Raspberry Pi. We can really narrow this down by lowering the quality of the image and analyzing upscaled images. This will be a task for the next entry, along with rotation algorithms and color identification methods.


Conclusion and Continuation

In this entry into the image processing series, I introduced edge detection techniques and ultimately implemented the Canny algorithm to detect multiple objects. A histogram of the edge detection was also introduced, which allowed us to approximate the region where object edges emerge and background noise is ignored. These methods furthermore extracted the shapes of multiple objects, which we were able to separate using the machine learning tool DBSCAN. The DBSCAN tool took the general scatter of points and was able to identify each individual object. This series of steps allows machines to pull out multiple objects from an image and attach certain identifiers to each. In the next entry, I will explore the individual objects and apply certain transformations and approximations ranging from re-orienting the objects, approximating each object’s color, and decreasing the computation time by compressing or upscaling the image to increase the real-time rate of analysis capable by the Raspberry Pi.

 

See More in Image Processing and Raspberry Pi: