Because GeoPySpark is a binding of an existing project, GeoTrellis, some terminology and data representations have carried over. This section seeks to explain this jargon in addition to describing how GeoTrellis types are represented in GeoPySpark.
Before begining, all examples in this guide need the following boilerplate code:
import datetime import numpy as np import geopyspark as gps from shapely.geometry import box
GeoPySpark differs in how it represents rasters from other geo-spatial
Python libraries like rasterIO. In GeoPySpark, they are represented by
Tile class. This class contains a numpy array (refered to as
cells) that represents the cells of the raster in addition to other
information regarding the data. Along with
Tile can also
no_data_value of the raster.
Note: All rasters in GeoPySpark are represented as having multiple bands, even if the original raster just contained one.
arr = np.array([[[0, 0, 0, 0], [1, 1, 1, 1], [2, 2, 2, 2]]], dtype=np.int16) # The resulting Tile will set -10 as the no_data_value for the raster gps.Tile.from_numpy_array(numpy_array=arr, no_data_value=-10)
# The resulting Tile will have no no_data_value gps.Tile.from_numpy_array(numpy_array=arr)
Describes the area on Earth a raster represents. This area is
represented by coordinates that are in some Coordinate Reference System.
Thus, depending on the system in use, the values that outline the
Extent can vary.
Extent can also be refered to as a bounding
Note: The values within the
Extent must be
floats and not
extent = gps.Extent(0.0, 0.0, 10.0, 10.0) extent
ProjectedExtent describes both the area on Earth a raster represents
in addition to its CRS. Either the EPSG code or a proj4 string can be
used to indicate the CRS of the
# Using an EPSG code gps.ProjectedExtent(extent=extent, epsg=3857)
# Using a Proj4 String proj4 = "+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +a=6378137 +b=6378137 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs " gps.ProjectedExtent(extent=extent, proj4=proj4)
the area on Earth the raster represents, its CRS, and the time the data
was represents. This point of time, called
instant, is an instance
time = datetime.datetime.now() gps.TemporalProjectedExtent(extent=extent, instant=time, epsg=3857)
TileLayout describes the grid which represents how rasters are
orginized and assorted in a layer.
detail how many columns and rows the grid itself has, respectively.
tileRows tell how many columns and rows each
individual raster has.
# Describes a layer where there are four rasters in a 2x2 grid. Each raster has 256 cols and rows. tile_layout = gps.TileLayout(layoutCols=2, layoutRows=2, tileCols=256, tileRows=256) tile_layout
LayoutDefinition describes both how the rasters are orginized in a
layer as well as the area covered by the grid.
layout_definition = gps.LayoutDefinition(extent=extent, tileLayout=tile_layout) layout_definition
It is often the case that the exact layout of the layer is unknown. Rather than having to go through the effort of trying to figure out the optimal layout, there exists two different tiling strategies that will produce a layout based on the data they are given.
LocalLayout is the first tiling strategy that produces a layout
where the grid is constructed over all of the pixels within a layer of a
given tile size. The resulting layout will match the original resolution
of the cells within the rasters.
Note: This layout cannot be used for creating display layers. Rather, it is best used for layers where operations and analysis will be performed.
# Creates a LocalLayout where each tile within the grid will be 256x256 pixels. gps.LocalLayout()
# Creates a LocalLayout where each tile within the grid will be 512x512 pixels. gps.LocalLayout(tile_size=512)
# Creates a LocalLayout where each tile within the grid will be 256x512 pixels. gps.LocalLayout(tile_cols=256, tile_rows=512)
The other tiling strategy is
GlobalLayout which makes a layout where
the grid is constructed over the global extent CRS. The cell resolution
of the resulting layer be multiplied by a power of 2 for the CRS. Thus,
using this strategy will result in either up or down sampling of the
Note: This layout strategy should be used when the resulting layer is to be dispalyed in a TMS server.
# Creates a GobalLayout instance with the default values gps.GlobalLayout()
# Creates a GlobalLayout instance for a zoom of 12 gps.GlobalLayout(zoom=12)
You may have noticed from the above two examples that
does not create layout for a given zoom level by default. Rather, it
determines what the zoom should be based on the size of the cells within
the rasters. If you do want to create a layout for a specific zoom
level, then the
zoom parameter must be set.
SpatialKeys describe the positions of rasters within the grid of
the layout. This grid is a 2D plane where the location of a raster is
represented by a pair of coordinates,
As its name and attributes suggest,
SpatialKey deals solely with
SpaceTimeKeys describe the position of a
raster in a layout. However, the grid is a 3D plane where a location of
a raster is represented by a pair of coordinates,
as well as a z value that represents a point in time called,
instant. Like the
is also an instance of
deal with spatial-temporal data.
gps.SpaceTimeKey(col=0, row=0, instant=time)
Bounds represents the the extent of the layout grid in terms of
keys. It has both a
minKey and a
maxKey attributes. These can
either be a
SpatialKey or a
SpaceTimeKey depending on the type
of data within the layer. The
minKey is the left, uppermost cell in
the grid and the
maxKey is the right, bottommost cell.
# Creating a Bounds from SpatialKeys min_spatial_key = gps.SpatialKey(0, 0) max_spatial_key = gps.SpatialKey(10, 10) bounds = gps.Bounds(min_spatial_key, max_spatial_key) bounds
# Creating a Bounds from SpaceTimeKeys min_space_time_key = gps.SpaceTimeKey(0, 0, 1.0) max_space_time_key = gps.SpaceTimeKey(10, 10, 1.0) gps.Bounds(min_space_time_key, max_space_time_key)
Metadata contains information of the values within a layer. This
data pertains to the layout, projection, and extent of the data
contained within the layer.
The below example shows how to construct
Metadata by hand, however,
this is almost never required and
Metadata can be produced using
easier means. For
RasterLayer, one can call the method,
TiledRasterLayer has the attribute,
# Creates Metadata for a layer with rasters that have a cell type of int16 with the previously defined # bounds, crs, extent, and layout definition. gps.Metadata(bounds=bounds, crs=proj4, cell_type=gps.CellType.INT16.value, extent=extent, layout_definition=layout_definition)
Features are shapely geometries with some kind
of associated metadata. The main purpose of this type is to provide a means
of rasterizing multiple geometries that have different cell values and who
While in practice a
Feature can have any kind of metadata,
needs to be used with
Features that are to be rasterized.
CellValue has two parameters:
value is the
value of all the cells that intersect the
Feature determines what value a cell will be if more than one geometry
intersects it. The higher the
zindex, the more priority it has.
geom1 = box(0, 0, 15, 15) geom2 = box(100, 26, 109, 208) geom3 = box(610, 215, 1000, 500) cell_value1 = gps.CellValue(value=1, zindex=1) cell_value2 = gps.CellValue(value=2, zindex=2) cell_value3 = gps.CellValue(value=3, zindex=3) # Will not be selected if feature2 and/or feature3 also intersects target cell feature1 = gps.Feature(geometry=geom1, properties=cell_value1) # Will not be selected if feature3 also intersects target cell feature2 = gps.Feature(geometry=geom2, properties=cell_value2) # Will always be selected feature3 = gps.Feature(geometry=geom3, properties=cell_value3)