Visualizing Data in GeoPySpark

Data is visualized in GeoPySpark by running a server which allows it to be viewed in an interactive way. Before putting the data on the server, however, it must first be formatted and colored. This guide seeks to go over the steps needed to create a visualization server in GeoPySpark.

Before begining, all examples in this guide need the following boilerplate code:

curl -o /tmp/cropped.tif https://s3.amazonaws.com/geopyspark-test/example-files/cropped.tif
import geopyspark as gps
import matplotlib.pyplot as plt

from colortools import Color
from pyspark import SparkContext

%matplotlib inline

conf = gps.geopyspark_conf(master="local[*]", appName="visualization")
pysc = SparkContext(conf=conf)

raster_layer = gps.geotiff.get(layer_type=gps.LayerType.SPATIAL, uri="/tmp/cropped.tif")
tiled_layer = raster_layer.tile_to_layout(layout=gps.GlobalLayout(), target_crs=3857)

Pyramid

The Pyramid class represents a list of TiledRasterLayers that represent the same area where each layer is a level within the pyramid at a specific zoom level. Thus, as one moves up the pyramid (starting a level 0), the image will have its pixel resolution increased by a power of 2 for each level. It is this varying level of detail that allows an interactive tile server to be created from a Pyramid. This class is needed in order to create visualizations of the contents within its layers.

Creating a Pyramid

There are currently two different ways to create a Pyramid instance: Through the TiledRasterLayer.pyramid method or by constructing it by passing in a [TiledRasterLayer] or {zoom_level: TiledRasterLayer} to Pyramid.

Any TiledRasterLayer with a max_zoom can be pyramided. However, the resulting Pyramid may have limited functionality depending on the layout of the source TiledRasterLayer. In order to be used for visualization, the Pyramid must have been created from TiledRasterLayer that was tiled using a GlobalLayout and whose tiles have a spatial resolution of a power of 2.

Via the pyramid Method

When using the Pyramid method, a Pyramid instance will be created with levels from 0 to TiledRasterlayer.zoom_level. Thus, if a TiledRasterLayer has a zoom_level of 12 then the resulting Pyramid will have 13 levels that each correspond to a zoom from 0 to 12.

pyramided = tiled_layer.pyramid()

Contrusting a Pyramid Manually

gps.Pyramid([tiled_layer.tile_to_layout(gps.GlobalLayout(zoom=x)) for x in range(0, 13)])
gps.Pyramid({x: tiled_layer.tile_to_layout(gps.GlobalLayout(zoom=x)) for x in range(0, 13)})

Computing the Histogram of a Pyramid

One can produce a Histogram instance representing the bottom most layer within a Pyramid via the get_histogram() method.

hist = pyramided.get_histogram()
hist

RDD Methods

Pyramid contains methods for working with the RDDs contained within its TiledRasterLayers. A list of these can be found here RDD Methods. When used, all internal RDDs will be operated on.

Map Algebra

While not as versatile as TiledRasterLayer in terms of map algebra operations, Pyramids are still able to perform local operations between themselves, ints, and floats.

Note: Operations between two or more Pyramids will occur on a per Tile basis which depends on the tiles having the same key. It is therefore possible to do an operation between two Pyramids and getting a result where nothing has changed if neither of the Pyramids have matching keys.

pyramided + 1

(2 * (pyramided + 2)) / 3

When performing operations on two or more Pyramids, if the Pyamids involved have different number of levels, then the resulting Pyramid will only have as many levels as the source Pyramid with the smallest level count.

small_pyramid = gps.Pyramid({x: tiled_layer.tile_to_layout(gps.GlobalLayout(zoom=x)) for x in range(0, 5)})
result = pyramided + small_pyramid
result.levels

ColorMap

The ColorMap class in GeoPySpark acts as a wrapper for the GeoTrellis ColorMap class. It is used to colorize the data within a layer when it’s being visualized.

Constructing a Color Ramp

Before we can initialize ColorMap we must first create a list of colors (or a color ramp) to pass in. This can be created either through a function in the color module or manually.

Using Matplotlib

The get_colors_from_matplotlib function creates a color ramp using the name of on an existing in color ramp in Matplotlib and the number of colors.

Note: This function will not work if Matplotlib is not installed.

gps.get_colors_from_matplotlib(ramp_name="viridis")
gps.get_colors_from_matplotlib(ramp_name="hot", num_colors=150)

From ColorTools

The second helper function for constructing a color ramp is get_colors_from_colors. This uses the colortools package to build the ramp from [Color] instances.

Note: This function will not work if colortools is not installed.

colors = [Color('green'), Color('red'), Color('blue')]
colors
colors_color_ramp = gps.get_colors_from_colors(colors=colors)
colors_color_ramp

Creating a ColorMap

ColorMap has many different ways of being constructed depending on the inputs it’s given.

From a Histogram

gps.ColorMap.from_histogram(histogram=hist, color_list=colors_color_ramp)

From a List of Colors

# Creates a ColorMap instance that will have three colors for the values that are less than or equal to 0, 250, and
# 1000.
gps.ColorMap.from_colors(breaks=[0, 250, 1000], color_list=colors_color_ramp)

For NLCD Data

If the layers you are working with contain data from NLCD, then it is possible to construct a ColorMap without first making a color ramp and passing in a list of breaks.

gps.ColorMap.nlcd_colormap()

From a Break Map

If there aren’t many colors to work with in the layer, than it may be easier to construct a ColorMap using a break_map, a dict that maps tile values to colors.

# The three tile values are 1, 2, and 3 and they correspond to the colors 0x00000000, 0x00000001, and 0x00000002
# respectively.
break_map = {
    1: 0x00000000,
    2: 0x00000001,
    3: 0x00000002
}

gps.ColorMap.from_break_map(break_map=break_map)

More General Build Method

As mentioned above, ColorMap has a more general classmethod called build() which takes a wide range of types to construct a ColorMap. In the following example, build will be passed the same inputs used in the previous examples.

# build using a Histogram
gps.ColorMap.build(breaks=hist, colors=colors_color_ramp)

# It is also possible to pass in the name of Matplotlib color ramp instead of constructing it yourself
gps.ColorMap.build(breaks=hist, colors="viridis")

# build using Colors
gps.ColorMap.build(breaks=colors_color_ramp, colors=colors)

# buld using breaks
gps.ColorMap.build(breaks=break_map)

Additional Coloring Options

In addition to supplying breaks and color values to ColorMap, there are other ways of changing the coloring strategy of a layer.

The following additional parameters that can be changed:

  • no_data_color: The color of the no_data_value of the Tiles. The default is 0x00000000
  • fallback: The color to use when a Tile value has no color mapping. The default is 0x00000000
  • classification_strategy: How the colors should be assigned to the values based on the breaks. The default is ClassificationStrategy.LESS_THAN_OR_EQUAL_TO.