Introduction
Plotting Shapefiles with Python
Clipping Satellite Data with a Shapefile
Mean LST for each Borough
Diurnal Land Surface Temperature
Conclusion and Continuation

This is the second entry in the satellite imagery analysis in Python. For part I, click here.

For part II, the focus shifts from the introduction of file formats and libraries to the geospatial analysis of satellite images. Python will again be used, along with many of its libraries. Land Surface Temperature will again be used as the data information, along with shapefiles used for geometric boundary setting, as well as information about buildings and land cover produced by local governments - all of which are used in meteorological and weather research and analyses.

Plotting Shapefiles with Python

A shapefile is a file format used to identify geographic boundaries and attributes of given geometries on earth [read more here]. Shapefiles are often used to quantify statistically significant information associated with rural, suburban, and urban areas. As an introduction to statistical analysis of satellite imagery, I will use a publicly-available shapefile of New York City (NYC). The shapefile can be downloaded at the following link:

https://data.cityofnewyork.us/api/geospatial/tqmj-j8zm?method=export&format=Shapefile

After downloading the file, unzip it and place it in the local Python script folder. The NYC shapefile can be plotted using similar Python Basemap methods introduced in the first entry in this satellite series. A code example is shown below, along with the sample output plot:

#!/usr/bin/python

import matplotlib as mpl
mpl.use('TkAgg')
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np

# create dummy figure for reading shapefile and setting boundaries
fig1 = plt.figure()
m1 = Basemap()
shp = m1.readshapefile('./Borough Boundaries/geo_export_23efe675-9411-42f6-812b-8775e2ba657d', 'nyc_shapefile')
plt.close(fig1) # close dummy figure
m1 = ()

# plotting actual shapefile
bbox = [shp[2][0],shp[2][1],shp[3][0],shp[3][1]]
fig,ax = plt.subplots(figsize=(14,8))
m = Basemap(llcrnrlon=bbox[0],llcrnrlat=bbox[1],urcrnrlon=bbox[2],
            urcrnrlat=bbox[3],resolution='i', projection='cyl')
m.readshapefile('./Borough Boundaries/geo_export_23efe675-9411-42f6-812b-8775e2ba657d',
                'nyc_shapefile',zorder=0)
parallels = np.linspace(bbox[1],bbox[3],5.) # latitudes
m.drawparallels(parallels,labels=[True,False,False,False],fontsize=12)
meridians = np.linspace(bbox[0],bbox[2],5.) # longitudes
m.drawmeridians(meridians,labels=[False,False,False,True],fontsize=12)

plt.savefig('nyc_shapefile_plain.png',dpi=200,facecolor=[252/255,252/255,252/255]) # uncomment to save figure
plt.show()

We will use this shapefile to quantify variation of land surface temperature over each borough of New York City. The shapefile contains information about each borough, which we can analyze using the following method:

m.nyc_shapefile_info

If we print out the first polygon in the shapefile it gives the following:

{'boro_code': 2.0, 'boro_name': 'Bronx', 'shape_area': 1186612476.97, 'shape_leng': 462958.186921, 'RINGNUM': 1, 'SHAPENUM': 1}

Where we can see the borough code and borough name, along with the area and length of the shape. We can use the borough codes to color each borough of the city:

#!/usr/bin/python

from netCDF4 import Dataset
import matplotlib as mpl
mpl.use('TkAgg')
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection
from matplotlib.patches import PathPatch
import numpy as np

# create dummy figure for reading shapefile and setting boundaries
fig1 = plt.figure()
m1 = Basemap()
shp = m1.readshapefile('./Borough Boundaries/geo_export_23efe675-9411-42f6-812b-8775e2ba657d', 'nyc_shapefile')
plt.close(fig1) # close dummy figure
m1 = ()

# plotting actual shapefile
bbox = [shp[2][0],shp[2][1],shp[3][0],shp[3][1]]
fig,ax = plt.subplots(figsize=(14,8))
##ax.set_axis_off()
m = Basemap(llcrnrlon=bbox[0],llcrnrlat=bbox[1],urcrnrlon=bbox[2],
            urcrnrlat=bbox[3],resolution='i', projection='cyl')
m.readshapefile('./Borough Boundaries/geo_export_23efe675-9411-42f6-812b-8775e2ba657d', 'nyc_shapefile')
m.drawmapboundary(fill_color='#bdd5d5')
parallels = np.linspace(bbox[1],bbox[3],5.) # latitudes
m.drawparallels(parallels,labels=[True,False,False,False],fontsize=12)
meridians = np.linspace(bbox[0],bbox[2],5.) # longitudes
m.drawmeridians(meridians,labels=[False,False,False,True],fontsize=12)
plt.tight_layout()

boro_array = []
for ii in m.nyc_shapefile_info:
    boro_array.append(ii['boro_code'])

boro_unique = np.unique(boro_array)
colors = ['r','b','k','g','m']
for boro_iter in range(0,len(boro_unique)):
    patches   = []
    for ii,jj in zip(m.nyc_shapefile_info, m.nyc_shapefile):
        if ii['boro_code']==boro_unique[boro_iter]:
            patches.append(Polygon(np.array(jj),True))
        
    ax.add_collection(PatchCollection(patches, facecolor= colors[boro_iter], edgecolor='k', linewidths=1., zorder=2))

plt.savefig('nyc_shapefile_colors.png',dpi=200,facecolor=[252/255,252/255,252/255]) # uncomment to save figure
plt.show()

Clipping Satellite Data with a Shapefile

Since we are only interested in a small region for analysis (NYC), we can use the shapefile of the city to clip the boundaries of data and minimize the amount of processing needed. The code below is a continuation of the codes outline above, with the addition of corner calculations that find the indices of the lower-left, upper-left, lower-right, and upper-right corners of the LST file. This will allow us to clip the full 1500x2500 data file down to a much smaller size for more efficient processing.

#!/usr/bin/python

from netCDF4 import Dataset
import os,ogr
import matplotlib as mpl
mpl.use('TkAgg')
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection
from matplotlib.patches import PathPatch
import numpy as np

def lat_lon_reproj(nc_folder):
    os.chdir(nc_folder)
    full_direc = os.listdir()
    nc_files = [ii for ii in full_direc if ii.endswith('.nc')]
    g16_data_file = nc_files[0] # select .nc file
    print(nc_files[0]) # print file name

    # designate dataset
    g16nc = Dataset(g16_data_file, 'r')
    var_names = [ii for ii in g16nc.variables]
    var_name = var_names[0]

    # GOES-R projection info and retrieving relevant constants
    proj_info = g16nc.variables['goes_imager_projection']
    lon_origin = proj_info.longitude_of_projection_origin
    H = proj_info.perspective_point_height+proj_info.semi_major_axis
    r_eq = proj_info.semi_major_axis
    r_pol = proj_info.semi_minor_axis

    # grid info
    lat_rad_1d = g16nc.variables['x'][:]
    lon_rad_1d = g16nc.variables['y'][:]
    
    # close file when finished
    g16nc.close()
    g16nc = None

    # create meshgrid filled with radian angles
    lat_rad,lon_rad = np.meshgrid(lat_rad_1d,lon_rad_1d)

    # lat/lon calc routine from satellite radian angle vectors

    lambda_0 = (lon_origin*np.pi)/180.0

    a_var = np.power(np.sin(lat_rad),2.0) + (np.power(np.cos(lat_rad),2.0)*\
                                             (np.power(np.cos(lon_rad),2.0)+\
                                              (((r_eq*r_eq)/(r_pol*r_pol))*\
                                               np.power(np.sin(lon_rad),2.0))))
    b_var = -2.0*H*np.cos(lat_rad)*np.cos(lon_rad)
    c_var = (H**2.0)-(r_eq**2.0)

    r_s = (-1.0*b_var - np.sqrt((b_var**2)-(4.0*a_var*c_var)))/(2.0*a_var)

    s_x = r_s*np.cos(lat_rad)*np.cos(lon_rad)
    s_y = - r_s*np.sin(lat_rad)
    s_z = r_s*np.cos(lat_rad)*np.sin(lon_rad)

    # latitude and longitude projection for plotting data on traditional lat/lon maps
    lat = (180.0/np.pi)*(np.arctan(((r_eq*r_eq)/(r_pol*r_pol))*\
                                   ((s_z/np.sqrt(((H-s_x)*(H-s_x))+(s_y*s_y))))))
    lon = (lambda_0 - np.arctan(s_y/(H-s_x)))*(180.0/np.pi)

    os.chdir('../')

    return lon,lat

def data_grab(nc_folder,nc_indx):
    os.chdir(nc_folder)
    full_direc = os.listdir()
    nc_files = [ii for ii in full_direc if ii.endswith('.nc')]
    g16_data_file = nc_files[nc_indx] # select .nc file
    print(nc_files[nc_indx]) # print file name

    # designate dataset
    g16nc = Dataset(g16_data_file, 'r')
    var_names = [ii for ii in g16nc.variables]
    var_name = var_names[0]

    # data info    
    data = g16nc.variables[var_name][:]
    data_units = g16nc.variables[var_name].units
    data_time_grab = ((g16nc.time_coverage_end).replace('T',' ')).replace('Z','')
    data_long_name = g16nc.variables[var_name].long_name
    
    # close file when finished
    g16nc.close()
    g16nc = None

    os.chdir('../')
    # print test coordinates
    print('{} N, {} W'.format(lat[318,1849],abs(lon[318,1849])))

    return data,data_units,data_time_grab,data_long_name,var_name

##################
# create dummy figure for reading shapefile and setting boundaries
##################

fig1 = plt.figure()
m1 = Basemap()
shp = m1.readshapefile('./Borough Boundaries/geo_export_23efe675-9411-42f6-812b-8775e2ba657d',
                       'nyc_shapefile')
plt.close(fig1) # close dummy figure
m1 = ()

##################
# plotting actual shapefile
##################

bbox = [shp[2][0],shp[2][1],shp[3][0],shp[3][1]]
fig,ax = plt.subplots(figsize=(14,8))
m = Basemap(llcrnrlon=bbox[0],llcrnrlat=bbox[1],urcrnrlon=bbox[2],
            urcrnrlat=bbox[3],resolution='i', projection='cyl')
m.readshapefile('./Borough Boundaries/geo_export_23efe675-9411-42f6-812b-8775e2ba657d',
                'nyc_shapefile')
m.drawmapboundary(fill_color='#bdd5d5')
parallels = np.linspace(bbox[1],bbox[3],5.) # latitudes
m.drawparallels(parallels,labels=[True,False,False,False],fontsize=12)
meridians = np.linspace(bbox[0],bbox[2],5.) # longitudes
m.drawmeridians(meridians,labels=[False,False,False,True],fontsize=12)

##################
# finding boroughs and isolating them for shading in colors
##################

boro_array = []
for ii in m.nyc_shapefile_info:
    boro_array.append(ii['boro_code'])

boro_unique = np.unique(boro_array)
colors = ['r','b','k','g','m']
for boro_iter in range(0,len(boro_unique)):
    patches   = []
    for ii,jj in zip(m.nyc_shapefile_info, m.nyc_shapefile):
        if ii['boro_code']==boro_unique[boro_iter]:
            patches.append(Polygon(np.array(jj),True))
        
    ax.add_collection(PatchCollection(patches, facecolor= colors[boro_iter],
                                      edgecolor='k', linewidths=1., zorder=2))

##################
# GOES-16 LST section
##################

nc_folder = './GOES_files/' # define folder where .nc files are located
lon,lat = lat_lon_reproj(nc_folder)

file_indx = 10 # be sure to pick the correct file. Make sure the file is not too big either,
# some of the bands create large files (stick to band 7-16)

data,data_units,data_time_grab,data_long_name,var_name = data_grab(nc_folder,file_indx)

##################
# clipping lat/lon/data vectors to shapefile bounds
##################
# lower-left corner
llcrnr_loc_x = np.ma.argmin(np.ma.min(np.abs(np.ma.subtract(lon,bbox[0]))+\
                                      np.abs(np.ma.subtract(lat,bbox[1])),0))
llcrnr_loc = (np.ma.argmin(np.abs(np.ma.subtract(lon,bbox[0]))+\
                           np.abs(np.ma.subtract(lat,bbox[1])),0))[llcrnr_loc_x]
# upper-left corner
ulcrnr_loc_x = np.ma.argmin(np.ma.min(np.abs(np.ma.subtract(lon,bbox[0]))+\
                                      np.abs(np.ma.subtract(lat,bbox[3])),0))
ulcrnr_loc = (np.ma.argmin(np.abs(np.ma.subtract(lon,bbox[0]))+\
                           np.abs(np.ma.subtract(lat,bbox[3])),0))[ulcrnr_loc_x]
# lower-right corner
lrcrnr_loc_x = np.ma.argmin(np.ma.min(np.abs(np.ma.subtract(lon,bbox[2]))+\
                                      np.abs(np.ma.subtract(lat,bbox[1])),0))
lrcrnr_loc = (np.ma.argmin(np.abs(np.ma.subtract(lon,bbox[2]))+\
                           np.abs(np.ma.subtract(lat,bbox[1])),0))[lrcrnr_loc_x]
# upper-right corner
urcrnr_loc_x = np.ma.argmin(np.ma.min(np.abs(np.ma.subtract(lon,bbox[2]))+\
                                      np.abs(np.ma.subtract(lat,bbox[3])),0))
urcrnr_loc = (np.ma.argmin(np.abs(np.ma.subtract(lon,bbox[2]))+\
                           np.abs(np.ma.subtract(lat,bbox[3])),0))[urcrnr_loc_x]

x_bounds = [llcrnr_loc_x,ulcrnr_loc_x,lrcrnr_loc_x,urcrnr_loc_x]
y_bounds = [llcrnr_loc,ulcrnr_loc,lrcrnr_loc,urcrnr_loc]

# setting bounds for new clipped lat/lon/data vectors
plot_bounds = [np.min(x_bounds),np.min(y_bounds),np.max(x_bounds)+1,np.max(y_bounds)+1]

# new clipped data vectors
lat_clip = lat[plot_bounds[1]:plot_bounds[3],plot_bounds[0]:plot_bounds[2]]
lon_clip = lon[plot_bounds[1]:plot_bounds[3],plot_bounds[0]:plot_bounds[2]]
dat_clip = data[plot_bounds[1]:plot_bounds[3],plot_bounds[0]:plot_bounds[2]]

# plot the new data only in the clipped region
m.pcolormesh(lon_clip, lat_clip, dat_clip, latlon=True,zorder=999,
             alpha=0.95,cmap='jet') # plotting actual LST data
cb = m.colorbar()

data_units = ((data_units.replace('-','^{-')).replace('1','1}')).replace('2','2}')
plt.rc('text', usetex=True)
cb.set_label(r'%s $ \left[ \mathrm \right] $'% (var_name,data_units),fontsize=18)
plt.title(' on '.format(data_long_name,data_time_grab),fontsize=14)

plt.show()

The section that clips the data finds the regions in the GOES-16 latitude/longitude points that most-closely align with the shapefile bounds. The resulting satellite image is plotted below atop the NYC shapefile:

In the figure above, we can see that around 1600 UTC time (11 a.m. NY time), there is quite a bit of warming. And upon checking the historical weather for that day, the air temperature around that time (not the same as LST) was roughly 280 K. LST is typically warmer than air temperature during the day time due to the solar heating of the surface, so we can assume the LST range of 280K-290K is fairly justified. In the next section, we will explore the average temperature of each borough by grouping each GOES-16 satellite LST pixels into its respective borough and calculating a few LST statistics for each borough.

Mean LST for each Borough

A somewhat interesting and applicable calculation can be carried-out using the borough boundaries to approximate the mean land surface temperature (LST) for each borough of New York City. One method for determining if a data point is within a shapefile geometry is to loop through each point and use the Geospatial Data Abstraction Library (GDAL) spatial filter. This filter method is shown below:

##################
# routine that checks which points are inside each shapefile borough
##################

drv    = ogr.GetDriverByName('ESRI Shapefile') # define shapefile driver
ds_in  = drv.Open('./Borough Boundaries/geo_export_315b4419-366b-4cae-8a5a-9b945ae9112c.shp') #open shapefile
lyr_in = ds_in.GetLayer(0) # grab layer

boro_vec =  np.zeros((len(boro_unique),)) # borough vector for holding LST values
boro_count = np.zeros((len(boro_unique),)) # count points in each borough

# looping through each lat/lon point and summing LST value if it falls within one of the boroughs
for lat_pt,lon_pt,dat_pt in zip(lat_clip.flatten(),lon_clip.flatten(),dat_clip.flatten()):
    pt = ogr.Geometry(ogr.wkbPoint) # define point geometry
    pt.SetPoint_2D(0, np.float(lon_pt), np.float(lat_pt)) # set points 
    lyr_in.SetSpatialFilter(pt) # filter points with shapefiles
    
    # loop through boroughs to find where the points lie
    for feat_in in lyr_in:    
        ply = feat_in.GetGeometryRef()
        if ply.Contains(pt):
            if np.ma.isMA(dat_pt):
                continue
            boro_count[int(feat_in.GetField(0))-1]+=1
            boro_vec[int(feat_in.GetField(0))-1]+=dat_pt

This generalized code can be added to the code in the previous section. It loops through the clipped latitude, longitude, and data file to compile statistics for each borough. Of course, this can be done for any geometry that has defined boundaries. In our case, we use the five boroughs of NYC to quantify the average LST for each borough. We can visualize the mean LST values for each borough by using a colormap that colors in each borough with a specific color that mirrors a given temperature. A plot of this is shown below:

For the given day and hour, it appears that the boroughs of the Bronx and Queens were warmer than Staten Island, Manhattan, and Brooklyn. Staten Island is the coolest borough during this peak hour. The full code to replicate this is shown below:

import matplotlib as mpl
mpl.use('TkAgg')
from mpl_toolkits.basemap import Basemap
from netCDF4 import Dataset
import os,ogr,time
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection
from matplotlib.patches import PathPatch
import numpy as np
from datetime import datetime

def lat_lon_reproj(nc_folder):
    os.chdir(nc_folder)
    full_direc = os.listdir()
    nc_files = [ii for ii in full_direc if ii.endswith('.nc')]
    g16_data_file = nc_files[0] # select .nc file
    print(nc_files[0]) # print file name

    # designate dataset
    g16nc = Dataset(g16_data_file, 'r')
    var_names = [ii for ii in g16nc.variables]
    var_name = var_names[0]

    # GOES-R projection info and retrieving relevant constants
    proj_info = g16nc.variables['goes_imager_projection']
    lon_origin = proj_info.longitude_of_projection_origin
    H = proj_info.perspective_point_height+proj_info.semi_major_axis
    r_eq = proj_info.semi_major_axis
    r_pol = proj_info.semi_minor_axis

    # grid info
    lat_rad_1d = g16nc.variables['x'][:]
    lon_rad_1d = g16nc.variables['y'][:]
    
    # close file when finished
    g16nc.close()
    g16nc = None

    # create meshgrid filled with radian angles
    lat_rad,lon_rad = np.meshgrid(lat_rad_1d,lon_rad_1d)

    # lat/lon calc routine from satellite radian angle vectors

    lambda_0 = (lon_origin*np.pi)/180.0

    a_var = np.power(np.sin(lat_rad),2.0) + (np.power(np.cos(lat_rad),2.0)*\
                                             (np.power(np.cos(lon_rad),2.0)+\
                                              (((r_eq*r_eq)/(r_pol*r_pol))*\
                                               np.power(np.sin(lon_rad),2.0))))
    b_var = -2.0*H*np.cos(lat_rad)*np.cos(lon_rad)
    c_var = (H**2.0)-(r_eq**2.0)

    r_s = (-1.0*b_var - np.sqrt((b_var**2)-(4.0*a_var*c_var)))/(2.0*a_var)

    s_x = r_s*np.cos(lat_rad)*np.cos(lon_rad)
    s_y = - r_s*np.sin(lat_rad)
    s_z = r_s*np.cos(lat_rad)*np.sin(lon_rad)

    # latitude and longitude projection for plotting data on traditional lat/lon maps
    lat = (180.0/np.pi)*(np.arctan(((r_eq*r_eq)/(r_pol*r_pol))*\
                                   ((s_z/np.sqrt(((H-s_x)*(H-s_x))+(s_y*s_y))))))
    lon = (lambda_0 - np.arctan(s_y/(H-s_x)))*(180.0/np.pi)

    os.chdir('../')

    return lon,lat

def data_grab(nc_folder,nc_indx):
    os.chdir(nc_folder)
    full_direc = os.listdir()
    nc_files = [ii for ii in full_direc if ii.endswith('.nc')]
    g16_data_file = nc_files[nc_indx] # select .nc file
    print(nc_files[nc_indx]) # print file name

    # designate dataset
    g16nc = Dataset(g16_data_file, 'r')
    var_names = [ii for ii in g16nc.variables]
    var_name = var_names[0]

    # data info    
    data = g16nc.variables[var_name][:]
    data_units = g16nc.variables[var_name].units
    data_time_grab = ((g16nc.time_coverage_end).replace('T',' ')).replace('Z','')
    data_long_name = g16nc.variables[var_name].long_name
    
    # close file when finished
    g16nc.close()
    g16nc = None

    os.chdir('../')
    # print test coordinates
    print('{} N, {} W'.format(lat[318,1849],abs(lon[318,1849])))

    return data,data_units,data_time_grab,data_long_name,var_name

##################
# create dummy figure for reading shapefile and setting boundaries
##################

fig1 = plt.figure()
m1 = Basemap()
shp = m1.readshapefile('./Borough Boundaries/geo_export_315b4419-366b-4cae-8a5a-9b945ae9112c',
                       'nyc_shapefile')
plt.close(fig1) # close dummy figure
m1 = ()

##################
# plotting actual shapefile
##################

bbox = [shp[2][0],shp[2][1],shp[3][0],shp[3][1]]
fig,ax = plt.subplots(figsize=(14,8))
m = Basemap(llcrnrlon=bbox[0],llcrnrlat=bbox[1],urcrnrlon=bbox[2],
            urcrnrlat=bbox[3],resolution='i', projection='cyl')
m.readshapefile('./Borough Boundaries/geo_export_315b4419-366b-4cae-8a5a-9b945ae9112c',
                'nyc_shapefile')
m.drawmapboundary(fill_color='#bdd5d5')
parallels = np.linspace(bbox[1],bbox[3],5.) # latitudes
m.drawparallels(parallels,labels=[True,False,False,False],fontsize=12)
meridians = np.linspace(bbox[0],bbox[2],5.) # longitudes
m.drawmeridians(meridians,labels=[False,False,False,True],fontsize=12)

##################
# finding boroughs
##################

boro_array = []
for ii in m.nyc_shapefile_info:
    boro_array.append(ii['boro_code'])

boro_unique = np.unique(boro_array)


##################
# GOES-16 LST section
##################

nc_folder = './GOES_files/' # define folder where .nc files are located
lon,lat = lat_lon_reproj(nc_folder)

nc_files = [ii for ii in os.listdir(nc_folder) if ii.endswith('.nc')]
time_vec = []
for time_nc in nc_files:
    time_grab = Dataset(nc_folder+time_nc, 'r')
    time_str = ((time_grab.time_coverage_end).replace('T',' ')).replace('Z','')
    time_vec.append(datetime.strptime(time_str,'%Y-%m-%d %H:%M:%S.%f'))

nc_file_arguments = np.argsort(time_vec)

file_indx = nc_file_arguments[90] # file index selection

data,data_units,data_time_grab,data_long_name,var_name = data_grab(nc_folder,file_indx)

##################
# clipping lat/lon/data vectors to shapefile bounds
##################

llcrnr_loc_x = np.ma.argmin(np.ma.min(np.abs(np.ma.subtract(lon,bbox[0]))+\
                                      np.abs(np.ma.subtract(lat,bbox[1])),0))
llcrnr_loc = (np.ma.argmin(np.abs(np.ma.subtract(lon,bbox[0]))+\
                           np.abs(np.ma.subtract(lat,bbox[1])),0))[llcrnr_loc_x]

ulcrnr_loc_x = np.ma.argmin(np.ma.min(np.abs(np.ma.subtract(lon,bbox[0]))+\
                                      np.abs(np.ma.subtract(lat,bbox[3])),0))
ulcrnr_loc = (np.ma.argmin(np.abs(np.ma.subtract(lon,bbox[0]))+\
                           np.abs(np.ma.subtract(lat,bbox[3])),0))[ulcrnr_loc_x]

lrcrnr_loc_x = np.ma.argmin(np.ma.min(np.abs(np.ma.subtract(lon,bbox[2]))+\
                                      np.abs(np.ma.subtract(lat,bbox[1])),0))
lrcrnr_loc = (np.ma.argmin(np.abs(np.ma.subtract(lon,bbox[2]))+\
                           np.abs(np.ma.subtract(lat,bbox[1])),0))[lrcrnr_loc_x]

urcrnr_loc_x = np.ma.argmin(np.ma.min(np.abs(np.ma.subtract(lon,bbox[2]))+\
                                      np.abs(np.ma.subtract(lat,bbox[3])),0))
urcrnr_loc = (np.ma.argmin(np.abs(np.ma.subtract(lon,bbox[2]))+\
                           np.abs(np.ma.subtract(lat,bbox[3])),0))[urcrnr_loc_x]

x_bounds = [llcrnr_loc_x,ulcrnr_loc_x,lrcrnr_loc_x,urcrnr_loc_x]
y_bounds = [llcrnr_loc,ulcrnr_loc,lrcrnr_loc,urcrnr_loc]

plot_bounds = [np.min(x_bounds),np.min(y_bounds),np.max(x_bounds)+1,np.max(y_bounds)+1]

lat_clip = lat[plot_bounds[1]:plot_bounds[3],plot_bounds[0]:plot_bounds[2]]
lon_clip = lon[plot_bounds[1]:plot_bounds[3],plot_bounds[0]:plot_bounds[2]]
dat_clip = data[plot_bounds[1]:plot_bounds[3],plot_bounds[0]:plot_bounds[2]]

##################
# routine that checks which points are inside each shapefile borough
##################

drv    = ogr.GetDriverByName('ESRI Shapefile') # define shapefile driver
ds_in  = drv.Open('./Borough Boundaries/geo_export_315b4419-366b-4cae-8a5a-9b945ae9112c.shp') #open shapefile
lyr_in = ds_in.GetLayer(0) # grab layer

boro_vec =  np.zeros((len(boro_unique),)) # borough vector for holding LST values
boro_count = np.zeros((len(boro_unique),)) # count points in each borough

# looping through each lat/lon point and summing LST value if it falls within one of the boroughs
for lat_pt,lon_pt,dat_pt in zip(lat_clip.flatten(),lon_clip.flatten(),dat_clip.flatten()):
    pt = ogr.Geometry(ogr.wkbPoint) # define point geometry
    pt.SetPoint_2D(0, np.float(lon_pt), np.float(lat_pt)) # set points 
    lyr_in.SetSpatialFilter(pt) # filter points with shapefiles
    
    # loop through boroughs to find where the points lie
    for feat_in in lyr_in:    
        ply = feat_in.GetGeometryRef()
        if ply.Contains(pt):
            if np.ma.isMA(dat_pt):
                continue
            boro_count[int(feat_in.GetField(0))-1]+=1
            boro_vec[int(feat_in.GetField(0))-1]+=dat_pt
            
##################
# calculate each borough's mean LST and color each borough using a heat map
##################

boro_means = boro_vec/boro_count

color_bounds = [np.min(boro_means),np.max(boro_means)]
color_range = np.interp(boro_means,color_bounds,[0,255])

colors = [plt.cm.jet(int(ii)) for ii in color_range]

for boro_iter in range(0,len(boro_unique)):
    patches   = []
    for ii,jj in zip(m.nyc_shapefile_info, m.nyc_shapefile):
        if ii['boro_code']==boro_unique[boro_iter]:
            patches.append(Polygon(np.array(jj),True))
        
    ax.add_collection(PatchCollection(patches, facecolor= colors[boro_iter],
                                      edgecolor='k', linewidths=1., zorder=2))

img_cb = m.imshow([color_bounds],cmap='jet')
cb = m.colorbar()
data_units = ((data_units.replace('-','^{-')).replace('1','1}')).replace('2','2}')
plt.rc('text', usetex=True)
cb.set_label(r'%s $ \left[ \mathrm \right] $'% (var_name,data_units),fontsize=18)
plt.title(' on '.format(data_long_name,data_time_grab),fontsize=14)
img_cb.remove()
plt.tight_layout()
plt.savefig('nyc_borough_LST_mean.png',dpi=200,facecolor=[252/255,252/255,252/255]) # uncomment to save figure
plt.show()

Diurnal Land Surface Temperature

I decided to investigate five days from the summer of 2018 (June 29 - July 3) during a heatwave (read about heat waves here). We can use the heatwave days to explore each borough’s daily LST profile. We do this by looping through each file, taking each borough’s averaged LST, and then stringing the multi-day profile into plot. This type of calculation is lengthy and involved, however, I have list the general process flow below to elucidate the methods used in the code:

Read the city shapefile and establish bounds
Loop through shapefile to count boroughs
Clip latitude/longitude vectors down to shapefile bounds
Loop through each file
For each file, use borough boundaries to calculate mean LST for each file (hour)
After the loop, rearrange times to plot a mean 24-hour diurnal profile during the heat wave

The resulting code and plot should be five lines that represent the mean LST trend for the heat wave for each borough of NYC.

import matplotlib as mpl
mpl.use('TkAgg')
from mpl_toolkits.basemap import Basemap
from netCDF4 import Dataset
import os,ogr,time
from datetime import datetime
import matplotlib.pyplot as plt
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection
from matplotlib.patches import PathPatch
import numpy as np

plt.style.use('ggplot')

def lat_lon_reproj(nc_folder):
    os.chdir(nc_folder)
    full_direc = os.listdir()
    nc_files = [ii for ii in full_direc if ii.endswith('.nc')]
    g16_data_file = nc_files[0] # select .nc file
    print(nc_files[0]) # print file name

    # designate dataset
    g16nc = Dataset(g16_data_file, 'r')
    var_names = [ii for ii in g16nc.variables]
    var_name = var_names[0]

    # GOES-R projection info and retrieving relevant constants
    proj_info = g16nc.variables['goes_imager_projection']
    lon_origin = proj_info.longitude_of_projection_origin
    H = proj_info.perspective_point_height+proj_info.semi_major_axis
    r_eq = proj_info.semi_major_axis
    r_pol = proj_info.semi_minor_axis

    # grid info
    lat_rad_1d = g16nc.variables['x'][:]
    lon_rad_1d = g16nc.variables['y'][:]
    
    # close file when finished
    g16nc.close()
    g16nc = None

    # create meshgrid filled with radian angles
    lat_rad,lon_rad = np.meshgrid(lat_rad_1d,lon_rad_1d)

    # lat/lon calc routine from satellite radian angle vectors

    lambda_0 = (lon_origin*np.pi)/180.0

    a_var = np.power(np.sin(lat_rad),2.0) + (np.power(np.cos(lat_rad),2.0)*\
                                             (np.power(np.cos(lon_rad),2.0)+\
                                              (((r_eq*r_eq)/(r_pol*r_pol))*\
                                               np.power(np.sin(lon_rad),2.0))))
    b_var = -2.0*H*np.cos(lat_rad)*np.cos(lon_rad)
    c_var = (H**2.0)-(r_eq**2.0)

    r_s = (-1.0*b_var - np.sqrt((b_var**2)-(4.0*a_var*c_var)))/(2.0*a_var)

    s_x = r_s*np.cos(lat_rad)*np.cos(lon_rad)
    s_y = - r_s*np.sin(lat_rad)
    s_z = r_s*np.cos(lat_rad)*np.sin(lon_rad)

    # latitude and longitude projection for plotting data on traditional lat/lon maps
    lat = (180.0/np.pi)*(np.arctan(((r_eq*r_eq)/(r_pol*r_pol))*\
                                   ((s_z/np.sqrt(((H-s_x)*(H-s_x))+(s_y*s_y))))))
    lon = (lambda_0 - np.arctan(s_y/(H-s_x)))*(180.0/np.pi)

    os.chdir('../')

    return lon,lat

def data_grab(nc_folder,nc_indx):
    os.chdir(nc_folder)
    full_direc = os.listdir()
    nc_files = [ii for ii in full_direc if ii.endswith('.nc')]
    g16_data_file = nc_files[nc_indx] # select .nc file
    print(nc_files[nc_indx]) # print file name

    # designate dataset
    g16nc = Dataset(g16_data_file, 'r')
    var_names = [ii for ii in g16nc.variables]
    var_name = var_names[0]

    # data info    
    data = g16nc.variables[var_name][:]
    data_units = g16nc.variables[var_name].units
    data_time_grab = ((g16nc.time_coverage_end).replace('T',' ')).replace('Z','')
    data_long_name = g16nc.variables[var_name].long_name
    
    # close file when finished
    g16nc.close()
    g16nc = None

    os.chdir('../')
    # print test coordinates
    print('{} N, {} W'.format(lat[318,1849],abs(lon[318,1849])))

    return data,data_units,data_time_grab,data_long_name,var_name

##################
# reading shapefile and setting boundaries
##################

drv    = ogr.GetDriverByName('ESRI Shapefile') # define shapefile driver
ds_in  = drv.Open('./Borough Boundaries/geo_export_315b4419-366b-4cae-8a5a-9b945ae9112c.shp') #open shapefile
lyr_in = ds_in.GetLayer(0) # grab layer
shp = lyr_in.GetExtent()

bbox = [shp[0],shp[2],shp[1],shp[3]]

##################
# finding boroughs
##################

boro_array = []
for ii in range(0,(lyr_in.GetFeatureCount())):
    boro_array.append((lyr_in.GetFeature(ii)).GetField(0))

boro_unique = np.unique(boro_array)
boro_str = ['']*len(boro_unique)

##################
# clipping lat/lon/data vectors to shapefile bounds
##################

nc_folder = './GOES_files/' # define folder where .nc files are located
lon,lat = lat_lon_reproj(nc_folder)

llcrnr_loc_x = np.ma.argmin(np.ma.min(np.abs(np.ma.subtract(lon,bbox[0]))+\
                                      np.abs(np.ma.subtract(lat,bbox[1])),0))
llcrnr_loc = (np.ma.argmin(np.abs(np.ma.subtract(lon,bbox[0]))+\
                           np.abs(np.ma.subtract(lat,bbox[1])),0))[llcrnr_loc_x]

ulcrnr_loc_x = np.ma.argmin(np.ma.min(np.abs(np.ma.subtract(lon,bbox[0]))+\
                                      np.abs(np.ma.subtract(lat,bbox[3])),0))
ulcrnr_loc = (np.ma.argmin(np.abs(np.ma.subtract(lon,bbox[0]))+\
                           np.abs(np.ma.subtract(lat,bbox[3])),0))[ulcrnr_loc_x]

lrcrnr_loc_x = np.ma.argmin(np.ma.min(np.abs(np.ma.subtract(lon,bbox[2]))+\
                                      np.abs(np.ma.subtract(lat,bbox[1])),0))
lrcrnr_loc = (np.ma.argmin(np.abs(np.ma.subtract(lon,bbox[2]))+\
                           np.abs(np.ma.subtract(lat,bbox[1])),0))[lrcrnr_loc_x]

urcrnr_loc_x = np.ma.argmin(np.ma.min(np.abs(np.ma.subtract(lon,bbox[2]))+\
                                      np.abs(np.ma.subtract(lat,bbox[3])),0))
urcrnr_loc = (np.ma.argmin(np.abs(np.ma.subtract(lon,bbox[2]))+\
                           np.abs(np.ma.subtract(lat,bbox[3])),0))[urcrnr_loc_x]

x_bounds = [llcrnr_loc_x,ulcrnr_loc_x,lrcrnr_loc_x,urcrnr_loc_x]
y_bounds = [llcrnr_loc,ulcrnr_loc,lrcrnr_loc,urcrnr_loc]

plot_bounds = [np.min(x_bounds),np.min(y_bounds),np.max(x_bounds)+1,np.max(y_bounds)+1]

lat_clip = lat[plot_bounds[1]:plot_bounds[3],plot_bounds[0]:plot_bounds[2]]
lon_clip = lon[plot_bounds[1]:plot_bounds[3],plot_bounds[0]:plot_bounds[2]]

##################
# GOES-16 LST section
##################

time_hour_vec,data_hour_vec,total_time_vec = [],[],[]

nc_files = [ii for ii in os.listdir(nc_folder) if ii.endswith('.nc')]
for file_indx in range(0,len(nc_files)):

    data,data_units,data_time_grab,data_long_name,var_name = data_grab(nc_folder,file_indx)

    dat_clip = data[plot_bounds[1]:plot_bounds[3],plot_bounds[0]:plot_bounds[2]]

    ##################
    # routine that checks which points are inside each shapefile borough
    ##################

    boro_vec =  np.zeros((len(boro_unique),)) # borough vector for holding LST values
    boro_count = np.zeros((len(boro_unique),)) # count points in each borough

    # looping through each lat/lon point and summing LST value if it falls within one of the boroughs
    
    for lat_pt,lon_pt,dat_pt in zip(lat_clip.flatten(),lon_clip.flatten(),dat_clip.flatten()):
        pt = ogr.Geometry(ogr.wkbPoint) # define point geometry
        pt.SetPoint_2D(0, np.float(lon_pt), np.float(lat_pt)) # set points 
        lyr_in.SetSpatialFilter(pt) # filter points with shapefiles
        
        # loop through boroughs to find where the points lie
        for feat_in in lyr_in:    
            ply = feat_in.GetGeometryRef()
            if ply.Contains(pt):
                if np.ma.isMA(dat_pt):
                    continue
                if boro_str[int(feat_in.GetField(0))-1]=='':
                    boro_str[int(feat_in.GetField(0))-1] = feat_in.GetField(1)
                boro_count[int(feat_in.GetField(0))-1]+=1
                boro_vec[int(feat_in.GetField(0))-1]+=dat_pt

    ##################
    # Adding hour of day and mean LST to vectors for later analysis
    ##################
    
    t1 = datetime.strptime(data_time_grab,'%Y-%m-%d %H:%M:%S.%f')
    
    boro_means = boro_vec/boro_count
    total_time_vec.append(t1)
    time_hour_vec.append(t1.hour)
    data_hour_vec.append(boro_means)

time_hour_vec = np.array(time_hour_vec)
data_hour_vec = np.array(data_hour_vec)

##################
# Plotting diurnal LST profile for each borough
##################
daily_mean = []
fig,ax = plt.subplots(figsize=(14,9))
for jj in range(0,np.shape(data_hour_vec)[1]):
    plot_var = []
    curr_vars = np.array([data_hour_vec[ii][jj] for ii in range(0,np.shape(data_hour_vec)[0])])
    for hour in np.unique(time_hour_vec):
        plot_var.append(np.nanmean(curr_vars[np.where(time_hour_vec==hour)]))
    
    ax.plot(np.unique(time_hour_vec),plot_var,marker='o',linestyle='-',
            markersize=8,linewidth=3,markeredgecolor='k',
            label=boro_str[jj])
    daily_mean.append(np.nanmean(plot_var))
    
ax.set_xlabel('Hour of Day',fontsize=18)
ax.set_ylabel('LST [K]',fontsize=18)
ax.tick_params(axis='both',which='major',labelsize=14)
ax.legend(fontsize=16)
data_units = ((data_units.replace('-','^{-')).replace('1','1}')).replace('2','2}')
ax.set_title(' from  - '.format(\
    data_long_name,
    datetime.strftime(np.min(total_time_vec),'%b %d, %Y'),
    datetime.strftime(np.max(total_time_vec),'%b %d, %Y')),
             fontsize=18)
plt.tight_layout()
plt.savefig('nyc_borough_diurnal_plot.png',dpi=200,
            facecolor=[252/255,252/255,252/255]) # uncomment to save figure
plt.show()

From the heat wave data taken from June 29th - July 3rd, we see the general daily trend in land surface temperature. If we translate the data from UTC to local time, we can conclude that peak heating occurs between 12 p.m. - 3 p.m. If we take the mean of each diurnal profile, we can quantitatively cite the mean daily temperature for each borough during the heat wave:

Manhattan : 302.2 K
Bronx : 302.2 K
Brooklyn : 301.7 K
Queens : 301.6 K
Staten Island : 301.6 K

We can conclude that the mean temperatures for each borough are about the same during the heat wave, though we can observe certain behaviors for each borough. Manhattan appears to hold on to heat more than the other boroughs, and Staten Island appears to be the coolest at any given time. I imagine the particularities of each borough would emerge if more data was added to the analysis.

Furthermore, if we focus on one borough, we can observe the temperature’s standard deviation for each hour during the heat wave. This will give us an idea of the span of the mean temperature for each hour. The plot is shown below for Manhattan:

Conclusion and Continuation

This concludes the second entry into the Python satellite imagery analysis tutorial series. In the first entry, I explored netCDF files and Python’s Basemap toolkit for reading and plotting geographic data and satellite imagery. In this tutorial, I introduced shapefiles as tools for clipping satellite data to city boundaries. I also demonstrated some powerful Python techniques for calculating mean parameters of particular geographic regions, using shapefiles and LST data. We were able to investigate the mean LST during a heat wave for New York City, while also visualizing the standard deviation of the diurnal LST profile. In the upcoming entry, I will use the National Land Cover Database (NLCD) and digital elevation maps to understand the distribution of LST under varying surface parameters.

click here for part III

See More in Python and GIS:

Featured