geopyspark.geotrellis package

This subpackage contains the code that reads, writes, and processes data using GeoTrellis.

class geopyspark.geotrellis.Bounds(minKey, maxKey)

Represents the grid that covers the area of the rasters in a RDD on a grid.

Parameters:
Returns:

Bounds

count(value) → integer -- return number of occurrences of value
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

maxKey

Alias for field number 1

minKey

Alias for field number 0

class geopyspark.geotrellis.Extent

The “bounding box” or geographic region of an area on Earth a raster represents.

Parameters:
  • xmin (float) – The minimum x coordinate.
  • ymin (float) – The minimum y coordinate.
  • xmax (float) – The maximum x coordinate.
  • ymax (float) – The maximum y coordinate.
xmin

float – The minimum x coordinate.

ymin

float – The minimum y coordinate.

xmax

float – The maximum x coordinate.

ymax

float – The maximum y coordinate.

count(value) → integer -- return number of occurrences of value
classmethod from_polygon(polygon)

Creates a new instance of Extent from a Shapely Polygon.

The new Extent will contain the min and max coordinates of the Polygon; regardless of the Polygon’s shape.

Parameters:polygon (shapely.geometry.Polygon) – A Shapely Polygon.
Returns:Extent
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

to_polygon

Converts this instance to a Shapely Polygon.

The resulting Polygon will be in the shape of a box.

Returns:shapely.geometry.Polygon
xmax

Alias for field number 2

xmin

Alias for field number 0

ymax

Alias for field number 3

ymin

Alias for field number 1

class geopyspark.geotrellis.LayoutDefinition(extent, tileLayout)

Describes the layout of the rasters within a RDD and how they are projected.

Parameters:
  • extent (Extent) – The Extent of the layout.
  • tileLayout (TileLayout) – The TileLayout of how the rasters within the RDD.
Returns:

LayoutDefinition

count(value) → integer -- return number of occurrences of value
extent

Alias for field number 0

index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

tileLayout

Alias for field number 1

class geopyspark.geotrellis.Metadata(bounds, crs, cell_type, extent, layout_definition)

Information of the values within a RasterRDD or TiledRasterRDD. This data pertains to the layout and other attributes of the data within the classes.

Parameters:
  • bounds (Bounds) – The Bounds of the values in the class.
  • crs (str or int) – The CRS of the data. Can either be the EPSG code, well-known name, or a PROJ.4 projection string.
  • cell_type (str) – The data type of the cells of the rasters.
  • extent (Extent) – The Extent that covers the all of the rasters.
  • layout_definition (LayoutDefinition) – The LayoutDefinition of all rasters.
bounds

Bounds – The Bounds of the values in the class.

crs

str or int – The CRS of the data. Can either be the EPSG code, well-known name, or a PROJ.4 projection string.

cell_type

str – The data type of the cells of the rasters.

extent

Extent – The Extent that covers the all of the rasters.

tile_layout

TileLayout – The TileLayout that describes how the rasters are orginized.

layout_definition

LayoutDefinition – The LayoutDefinition of all rasters.

classmethod from_dict(metadata_dict)

Creates Metadata from a dictionary.

Parameters:metadata_dict (dict) – The Metadata of a RasterRDD or TiledRasterRDD instance that is in dict form.
Returns:Metadata
to_dict()

Converts this instance to a dict.

Returns:dict
class geopyspark.geotrellis.TileLayout(layoutCols, layoutRows, tileCols, tileRows)

Describes the grid in which the rasters within a RDD should be laid out.

Parameters:
  • layoutCols (int) – The number of columns of rasters that runs east to west.
  • layoutRows (int) – The number of rows of rasters that runs north to south.
  • tileCols (int) – The number of columns of pixels in each raster that runs east to west.
  • tileRows (int) – The number of rows of pixels in each raster that runs north to south.
Returns:

TileLayout

count(value) → integer -- return number of occurrences of value
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

layoutCols

Alias for field number 0

layoutRows

Alias for field number 1

tileCols

Alias for field number 2

tileRows

Alias for field number 3

geopyspark.geotrellis.catalog module

Methods for reading, querying, and saving tile layers to and from GeoTrellis Catalogs.

geopyspark.geotrellis.catalog.get_layer_ids(geopysc, uri, options=None, **kwargs)

Returns a list of all of the layer ids in the selected catalog as dicts that contain the name and zoom of a given layer.

Parameters:
  • geopysc (geopyspark.GeoPyContext) – The GeoPyContext being used this session.
  • uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
  • options (dict, optional) – Additional parameters for reading the layer for specific backends. The dictionary is only used for Cassandra and HBase, no other backend requires this to be set.
  • **kwargs – The optional parameters can also be set as keywords arguments. The keywords must be in camel case. If both options and keywords are set, then the options will be used.
Returns:

[layerIds]

Where layerIds is a dict with the following fields:
  • name (str): The name of the layer
  • zoom (int): The zoom level of the given layer.

geopyspark.geotrellis.catalog.query(geopysc, rdd_type, uri, layer_name, layer_zoom, intersects, time_intervals=None, proj_query=None, options=None, numPartitions=None, **kwargs)

Queries a single, zoom layer from a GeoTrellis catalog given spatial and/or time parameters. Unlike read, this method will only return part of the layer that intersects the specified region.

Note

The whole layer could still be read in if intersects and/or time_intervals have not been set, or if the querried region contains the entire layer.

Parameters:
  • geopysc (GeoPyContext) – The GeoPyContext being used this session.
  • rdd_type (str) – What the spatial type of the geotiffs are. This is represented by the constants: SPATIAL and SPACETIME. Note: All of the GeoTiffs must have the same saptial type.
  • uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
  • layer_name (str) – The name of the GeoTrellis catalog to be querried.
  • layer_zoom (int) – The zoom level of the layer that is to be querried.
  • intersects (str or Polygon or Extent) –

    The desired spatial area to be returned. Can either be a string, a shapely Polygon, or an instance of Extent. If the value is a string, it must be the WKT string, geometry format.

    The types of Polygons supported:
    • Point
    • Polygon
    • MultiPolygon

    Note

    Only layers that were made from spatial, singleband GeoTiffs can query a Point. All other types are restricted to Polygon and MulitPolygon.

  • time_intervals (list, optional) – A list of strings that time intervals to query. The strings must be in a valid date-time format. This parameter is only used when querying spatial-temporal data. The default value is, None. If None, then only the spatial area will be querried.
  • options (dict, optional) – Additional parameters for querying the tile for specific backends. The dictioanry is only used for Cassandra and HBase, no other backend requires this to be set.
  • numPartitions (int, optional) – Sets RDD partition count when reading from catalog.
  • **kwargs – The optional parameters can also be set as keywords arguements. The keywords must be in camel case. If both options and keywords are set, then the options will be used.
Returns:

TiledRasterRDD

geopyspark.geotrellis.catalog.read(geopysc, rdd_type, uri, layer_name, layer_zoom, options=None, numPartitions=None, **kwargs)

Reads a single, zoom layer from a GeoTrellis catalog.

Note

This will read the entire layer. If only part of the layer is needed, use query() instead.

Parameters:
  • geopysc (GeoPyContext) – The GeoPyContext being used this session.
  • rdd_type (str) – What the spatial type of the geotiffs are. This is represented by the constants: SPATIAL and SPACETIME.
  • uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
  • layer_name (str) – The name of the GeoTrellis catalog to be read from.
  • layer_zoom (int) – The zoom level of the layer that is to be read.
  • options (dict, optional) – Additional parameters for reading the layer for specific backends. The dictionary is only used for Cassandra and HBase, no other backend requires this to be set.
  • numPartitions (int, optional) – Sets RDD partition count when reading from catalog.
  • **kwargs – The optional parameters can also be set as keywords arguments. The keywords must be in camel case. If both options and keywords are set, then the options will be used.
Returns:

TiledRasterRDD

geopyspark.geotrellis.catalog.read_layer_metadata(geopysc, rdd_type, uri, layer_name, layer_zoom, options=None, **kwargs)

Reads the metadata from a saved layer without reading in the whole layer.

Parameters:
  • geopysc (geopyspark.GeoPyContext) – The GeoPyContext being used this session.
  • rdd_type (str) – What the spatial type of the geotiffs are. This is represented by the constants: SPATIAL and SPACETIME.
  • uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
  • layer_name (str) – The name of the GeoTrellis catalog to be read from.
  • layer_zoom (int) – The zoom level of the layer that is to be read.
  • options (dict, optional) – Additional parameters for reading the layer for specific backends. The dictionary is only used for Cassandra and HBase, no other backend requires this to be set.
  • numPartitions (int, optional) – Sets RDD partition count when reading from catalog.
  • **kwargs – The optional parameters can also be set as keywords arguments. The keywords must be in camel case. If both options and keywords are set, then the options will be used.
Returns:

Metadata

geopyspark.geotrellis.catalog.read_value(geopysc, rdd_type, uri, layer_name, layer_zoom, col, row, zdt=None, options=None, **kwargs)

Reads a single tile from a GeoTrellis catalog. Unlike other functions in this module, this will not return a TiledRasterRDD, but rather a GeoPySpark formatted raster. This is the function to use when creating a tile server.

Note

When requesting a tile that does not exist, None will be returned.

Parameters:
  • geopysc (geopyspark.GeoPyContext) – The GeoPyContext being used this session.
  • rdd_type (str) – What the spatial type of the geotiffs are. This is represented by the constants: SPATIAL and SPACETIME.
  • uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
  • layer_name (str) – The name of the GeoTrellis catalog to be read from.
  • layer_zoom (int) – The zoom level of the layer that is to be read.
  • col (int) – The col number of the tile within the layout. Cols run east to west.
  • row (int) – The row number of the tile within the layout. Row run north to south.
  • zdt (str) – The Zone-Date-Time string of the tile. The string must be in a valid date-time format. This parameter is only used when querying spatial-temporal data. The default value is, None. If None, then only the spatial area will be queried.
  • options (dict, optional) – Additional parameters for reading the tile for specific backends. The dictionary is only used for Cassandra and HBase, no other backend requires this to be set.
  • **kwargs – The optional parameters can also be set as keywords arguments. The keywords must be in camel case. If both options and keywords are set, then the options will be used.
Returns:

Raster or None

geopyspark.geotrellis.catalog.write(uri, layer_name, tiled_raster_rdd, index_strategy='zorder', time_unit=None, options=None, **kwargs)

Writes a tile layer to a specified destination.

Parameters:
  • uri (str) – The Uniform Resource Identifier used to point towards the desired location for the tile layer to written to. The shape of this string varies depending on backend.
  • layer_name (str) – The name of the new, tile layer.
  • layer_zoom (int) – The zoom level the layer should be saved at.
  • tiled_raster_rdd (TiledRasterRDD) – The TiledRasterRDD to be saved.
  • index_strategy (str) – The method used to orginize the saved data. Depending on the type of data within the layer, only certain methods are available. The default method used is, ZORDER.
  • time_unit (str, optional) – Which time unit should be used when saving spatial-temporal data. While this is set to None as default, it must be set if saving spatial-temporal data. Depending on the indexing method chosen, different time units are used.
  • options (dict, optional) – Additional parameters for writing the layer for specific backends. The dictioanry is only used for Cassandra and HBase, no other backend requires this to be set.
  • **kwargs – The optional parameters can also be set as keywords arguements. The keywords must be in camel case. If both options and keywords are set, then the options will be used.

geopyspark.geotrellis.constants module

Constants that are used by geopyspark.geotrellis classes, methods, and functions.

geopyspark.geotrellis.constants.ANNULUS = 'annulus'

Neighborhood type.

geopyspark.geotrellis.constants.ASPECT = 'Aspect'

Focal operation type.

geopyspark.geotrellis.constants.AVERAGE = 'Average'

A resampling method.

geopyspark.geotrellis.constants.BILINEAR = 'Bilinear'

A resampling method.

geopyspark.geotrellis.constants.BLUE_TO_ORANGE = 'BlueToOrange'

A ColorRamp.

geopyspark.geotrellis.constants.BLUE_TO_RED = 'BlueToRed'

A ColorRamp.

geopyspark.geotrellis.constants.BOOL = 'bool'

Representes Byte Cells with constant NoData values.

geopyspark.geotrellis.constants.BOOLRAW = 'boolraw'

Representes Byte Cells.

geopyspark.geotrellis.constants.CELL_TYPES = ['boolraw', 'int8raw', 'uint8raw', 'int16raw', 'uint16raw', 'int32raw', 'float32raw', 'float64raw', 'bool', 'int8', 'uint8', 'int16', 'uint16', 'int32', 'float32', 'float64', 'int8ud', 'uint8ud', 'int16ud', 'uint16ud', 'int32ud', 'float32ud', 'float64ud']

A ColorRamp.

geopyspark.geotrellis.constants.CIRCLE = 'circle'

Focal operation type.

geopyspark.geotrellis.constants.CLASSIFICATION_BOLD_LAND_USE = 'ClassificationBoldLandUse'

A ColorRamp.

geopyspark.geotrellis.constants.COOLWARM = 'coolwarm'

A ColorRamp.

geopyspark.geotrellis.constants.CUBICCONVOLUTION = 'CubicConvolution'

A resampling method.

geopyspark.geotrellis.constants.CUBICSPLINE = 'CubicSpline'

A resampling method.

geopyspark.geotrellis.constants.DAYS = 'days'

A time unit used with ZORDER.

geopyspark.geotrellis.constants.EXACT = 'Exact'

Representes Bit Cells.

geopyspark.geotrellis.constants.FLOAT = 'float'

A key indexing method. Works for RDD that contain both SpatialKey and SpaceTimeKey.

geopyspark.geotrellis.constants.FLOAT32 = 'float32'

Representes Double Cells with constant NoData values.

geopyspark.geotrellis.constants.FLOAT32RAW = 'float32raw'

Representes Double Cells.

geopyspark.geotrellis.constants.FLOAT32UD = 'float32ud'

Representes Double Cells with user defined NoData values.

geopyspark.geotrellis.constants.FLOAT64 = 'float64'

Representes Byte Cells with user defined NoData values.

geopyspark.geotrellis.constants.FLOAT64RAW = 'float64raw'

Representes Bit Cells.

geopyspark.geotrellis.constants.GREATERTHAN = 'GreaterThan'

A classification strategy.

geopyspark.geotrellis.constants.GREATERTHANOREQUALTO = 'GreaterThanOrEqualTo'

A classification strategy.

geopyspark.geotrellis.constants.GREEN_TO_RED_ORANGE = 'GreenToRedOrange'

A ColorRamp.

geopyspark.geotrellis.constants.HEATMAP_BLUE_TO_YELLOW_TO_RED_SPECTRUM = 'HeatmapBlueToYellowToRedSpectrum'

A ColorRamp.

geopyspark.geotrellis.constants.HEATMAP_DARK_RED_TO_YELLOW_WHITE = 'HeatmapDarkRedToYellowWhite'

A ColorRamp.

geopyspark.geotrellis.constants.HEATMAP_LIGHT_PURPLE_TO_DARK_PURPLE_TO_WHITE = 'HeatmapLightPurpleToDarkPurpleToWhite'

A ColorRamp.

geopyspark.geotrellis.constants.HEATMAP_YELLOW_TO_RED = 'HeatmapYellowToRed'

A ColorRamp.

geopyspark.geotrellis.constants.HILBERT = 'hilbert'

A key indexing method. Works only for RDDs that contain SpatialKey. This method provides the fastest lookup of all the key indexing method, however, it does not give good locality guarantees. It is recommended then that this method should only be used when locality is not important for your analysis.

geopyspark.geotrellis.constants.HOT = 'hot'

A ColorRamp.

geopyspark.geotrellis.constants.HOURS = 'hours'

A time unit used with ZORDER.

geopyspark.geotrellis.constants.INFERNO = 'inferno'

A ColorRamp.

geopyspark.geotrellis.constants.INT16 = 'int16'

Representes UShort Cells with constant NoData values.

geopyspark.geotrellis.constants.INT16RAW = 'int16raw'

Representes UShort Cells.

geopyspark.geotrellis.constants.INT16UD = 'int16ud'

Representes UShort Cells with user defined NoData values.

geopyspark.geotrellis.constants.INT32 = 'int32'

Representes Float Cells with constant NoData values.

geopyspark.geotrellis.constants.INT32RAW = 'int32raw'

Representes Float Cells.

geopyspark.geotrellis.constants.INT32UD = 'int32ud'

Representes Float Cells with user defined NoData values.

geopyspark.geotrellis.constants.INT8 = 'int8'

Representes UByte Cells with constant NoData values.

geopyspark.geotrellis.constants.INT8RAW = 'int8raw'

Representes UByte Cells.

geopyspark.geotrellis.constants.INT8UD = 'int8ud'

Representes UByte Cells with user defined NoData values.

geopyspark.geotrellis.constants.LANCZOS = 'Lanczos'

A resampling method.

geopyspark.geotrellis.constants.LESSTHAN = 'LessThan'

A classification strategy.

geopyspark.geotrellis.constants.LESSTHANOREQUALTO = 'LessThanOrEqualTo'

A classification strategy.

geopyspark.geotrellis.constants.LIGHT_TO_DARK_GREEN = 'LightToDarkGreen'

A ColorRamp.

geopyspark.geotrellis.constants.LIGHT_TO_DARK_SUNSET = 'LightToDarkSunset'

A ColorRamp.

geopyspark.geotrellis.constants.LIGHT_YELLOW_TO_ORANGE = 'LightYellowToOrange'

A ColorRamp.

geopyspark.geotrellis.constants.MAGMA = 'magma'

A ColorRamp.

geopyspark.geotrellis.constants.MAX = 'Max'

A resampling method.

geopyspark.geotrellis.constants.MEAN = 'Mean'

Focal operation type

geopyspark.geotrellis.constants.MEDIAN = 'Median'

A resampling method.

geopyspark.geotrellis.constants.MILLISECONDS = 'millis'

A time unit used with ZORDER.

geopyspark.geotrellis.constants.MINUTES = 'minutes'

A time unit used with ZORDER.

geopyspark.geotrellis.constants.MODE = 'Mode'

A resampling method.

geopyspark.geotrellis.constants.MONTHS = 'months'

A time unit used with ZORDER.

geopyspark.geotrellis.constants.NEARESTNEIGHBOR = 'NearestNeighbor'

A resampling method.

geopyspark.geotrellis.constants.NEIGHBORHOODS = ['annulus', 'nesw', 'square', 'wedge', 'circle']

The NoData value for ints in GeoTrellis.

geopyspark.geotrellis.constants.NESW = 'nesw'

Neighborhood type.

geopyspark.geotrellis.constants.NODATAINT = -2147483648

A classification strategy.

geopyspark.geotrellis.constants.PLASMA = 'plasma'

A ColorRamp.

geopyspark.geotrellis.constants.RESAMPLE_METHODS = ['NearestNeighbor', 'Bilinear', 'CubicConvolution', 'Lanczos', 'Average', 'Mode', 'Median', 'Max', 'Min']

Layout scheme to match resolution of the closest level of TMS pyramid.

geopyspark.geotrellis.constants.ROWMAJOR = 'rowmajor'

A time unit used with ZORDER.

geopyspark.geotrellis.constants.SECONDS = 'seconds'

A time unit used with ZORDER.

geopyspark.geotrellis.constants.SLOPE = 'Slope'

Focal operation type.

geopyspark.geotrellis.constants.SPACETIME = 'spacetime'

Indicates the type value that needs to be serialized/deserialized. Both singleband and multiband GeoTiffs are referred to as this.

geopyspark.geotrellis.constants.SPATIAL = 'spatial'

Indicates that the RDD contains (K, V) pairs, where the K has a spatial and time attribute. Both TemporalProjectedExtent and SpaceTimeKey are examples of this type of K.

geopyspark.geotrellis.constants.SQUARE = 'square'

Neighborhood type.

geopyspark.geotrellis.constants.SUM = 'Sum'

Focal operation type.

geopyspark.geotrellis.constants.TILE = 'Tile'

A resampling method.

geopyspark.geotrellis.constants.UINT16 = 'uint16'

Representes Int Cells with constant NoData values.

geopyspark.geotrellis.constants.UINT16RAW = 'uint16raw'

Representes Int Cells.

geopyspark.geotrellis.constants.UINT16UD = 'uint16ud'

Representes Int Cells with user defined NoData values.

geopyspark.geotrellis.constants.UINT8 = 'uint8'

Representes Short Cells with constant NoData values.

geopyspark.geotrellis.constants.UINT8RAW = 'uint8raw'

Representes Short Cells.

geopyspark.geotrellis.constants.UINT8UD = 'uint8ud'

Representes Short Cells with user defined NoData values.

geopyspark.geotrellis.constants.VIRIDIS = 'viridis'

A ColorRamp.

geopyspark.geotrellis.constants.WEDGE = 'wedge'

Neighborhood type.

geopyspark.geotrellis.constants.YEARS = 'years'

Neighborhood type.

geopyspark.geotrellis.constants.ZOOM = 'zoom'

Layout scheme to match resolution of source rasters.

geopyspark.geotrellis.constants.ZORDER = 'zorder'

A key indexing method. Works for RDDs that contain both SpatialKey and SpaceTimeKey. Note, indexes are determined by the x, y, and if SPACETIME, the temporal resolutions of a point. This is expressed in bits, and has a max value of 62. Thus if the sum of those resolutions are greater than 62, then the indexing will fail.

geopyspark.geotrellis.geotiff_rdd module

This module contains functions that create RasterRDD from files.

geopyspark.geotrellis.geotiff_rdd.get(geopysc, rdd_type, uri, options=None, **kwargs)

Creates a RasterRDD from GeoTiffs that are located on the local file system, HDFS, or S3.

Parameters:
  • geopysc (geopyspark.GeoPyContext) – The GeoPyContext being used this session.
  • rdd_type (str) –

    What the spatial type of the geotiffs are. This is represented by the constants: SPATIAL and SPACETIME.

    Note

    All of the GeoTiffs must have the same saptial type.

  • uri (str) – The path to a given file/directory.
  • options (dict, optional) –

    A dictionary of different options that are used when creating the RDD. This defaults to None. If None, then the RDD will be created using the default options for the given backend in GeoTrellis.

    Note

    Key values in the dict should be in camel case, as this is the style that is used in Scala.

    These are the options when using the local file system or HDFS:
    • crs (str, optional): The CRS that the output tiles should be
      in. The CRS must be in the well-known name format. If None, then the CRS that the tiles were originally in will be used.
    • timeTag (str, optional): The name of the tiff tag that contains
      the time stamp for the tile. If None, then the default value is: TIFFTAG_DATETIME.
    • timeFormat (str, optional): The pattern of the time stamp for
      java.time.format.DateTimeFormatter to parse. If None, then the default value is: yyyy:MM:dd HH:mm:ss.
    • maxTileSize (int, optional): The max size of each tile in the
      resulting RDD. If the size is smaller than a read in tile, then that tile will be broken into tiles of the specified size. If None, then the whole tile will be read in.
    • numPartitions (int, optional): The number of repartitions Spark
      will make when the data is repartitioned. If None, then the data will not be repartitioned.
    • chunkSize (int, optional): How many bytes of the file should be
      read in at a time. If None, then files will be read in 65536 byte chunks.
    S3 has the above options in addition to this:
    • s3Client (str, optional): Which S3Cleint to use when reading
      GeoTiffs. There are currently two options: default and mock. If None, defualt is used.
      Note:
      mock should only be used in unit tests and debugging.
  • **kwargs – Option parameters can also be entered as keyword arguements.

Note

Defining both options and kwargs will cause the kwargs to be ignored in favor of options.

Returns:RasterRDD

geopyspark.geotrellis.neighborhoods module

Classes that represent the various neighborhoods used in focal functions.

Note

Once a parameter has been entered for any one of these classes it gets converted to a float if it was originally an int.

class geopyspark.geotrellis.neighborhoods.Annulus(inner_radius, outer_radius)

An Annulus neighborhood.

Parameters:
  • inner_radius (int or float) – The radius of the inner circle.
  • outer_radius (int or float) – The radius of the outer circle.
inner_radius

int or float – The radius of the inner circle.

outer_radius

int or float – The radius of the outer circle.

param_1

float – Same as inner_radius.

param_2

float – Same as outer_radius.

param_3

float – Unused param for Annulus. Is 0.0.

name

str – The name of the neighborhood which is, “annulus”.

class geopyspark.geotrellis.neighborhoods.Circle(radius)

A circle neighborhood.

Parameters:radius (int or float) – The radius of the circle that determines which cells fall within the bounding box.
radius

int or float – The radius of the circle that determines which cells fall within the bounding box.

param_1

float – Same as radius.

param_2

float – Unused param for Circle. Is 0.0.

param_3

float – Unused param for Circle. Is 0.0.

name

str – The name of the neighborhood which is, “circle”.

Note

Cells that lie exactly on the radius of the circle are apart of the neighborhood.

class geopyspark.geotrellis.neighborhoods.Nesw(extent)

A neighborhood that includes a column and row intersection for the focus.

Parameters:extent (int or float) – The extent of this neighborhood. This represents the how many cells past the focus the bounding box goes.
extent

int or float – The extent of this neighborhood. This represents the how many cells past the focus the bounding box goes.

param_1

float – Same as extent.

param_2

float – Unused param for Nesw. Is 0.0.

param_3

float – Unused param for Nesw. Is 0.0.

name

str – The name of the neighborhood which is, “nesw”.

class geopyspark.geotrellis.neighborhoods.Wedge(radius, start_angle, end_angle)

A wedge neighborhood.

Parameters:
  • radius (int or float) – The radius of the wedge.
  • start_angle (int or float) – The starting angle of the wedge in degrees.
  • end_angle (int or float) – The ending angle of the wedge in degrees.
radius

int or float – The radius of the wedge.

start_angle

int or float – The starting angle of the wedge in degrees.

end_angle

int or float – The ending angle of the wedge in degrees.

param_1

float – Same as radius.

param_2

float – Same as start_angle.

param_3

float – Same as end_angle.

name

str – The name of the neighborhood which is, “wedge”.

geopyspark.geotrellis.rdd module

This module contains the RasterRDD and the TiledRasterRDD classes. Both of these classes are wrappers of their Scala counterparts. These will be used in leau of actual PySpark RDDs when performing operations.

class geopyspark.geotrellis.rdd.CachableRDD

Base class for class that wraps a Scala RDD instance through a py4j reference.

geopysc

GeoPyContext – The GeoPyContext being used this session.

srdd

py4j.java_gateway.JavaObject – The coresponding Scala RDD class.

cache()

Persist this RDD with the default storage level (C{MEMORY_ONLY}).

persist(storageLevel=StorageLevel(False, True, False, False, 1))

Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).

unpersist()

Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.

wrapped_rdds()

Returns the list of RDD-containing objects wrapped by this object. The default implementation assumes that subclass contains a single RDD container, srdd, which implements the persist() and unpersist() methods.

class geopyspark.geotrellis.rdd.RasterRDD(geopysc, rdd_type, srdd)

A wrapper of a RDD that contains GeoTrellis rasters.

Represents a RDD that contains (K, V). Where K is either ProjectedExtent or TemporalProjectedExtent depending on the rdd_type of the RDD, and V being a Raster.

The data held within the RDD has not been tiled. Meaning the data has yet to be modified to fit a certain layout. See RasterRDD for more information.

Parameters:
  • geopysc (GeoPyContext) – The GeoPyContext being used this session.
  • rdd_type (str) – What the spatial type of the geotiffs are. This is represented by the constants: SPATIAL and SPACETIME.
  • srdd (py4j.java_gateway.JavaObject) – The coresponding Scala class. This is what allows RasterRDD to access the various Scala methods.
geopysc

GeoPyContext – The GeoPyContext being used this session.

rdd_type

str – What the spatial type of the geotiffs are. This is represented by the constants: SPATIAL and SPACETIME.

srdd

py4j.java_gateway.JavaObject – The coresponding Scala class. This is what allows RasterRDD to access the various Scala methods.

cache()

Persist this RDD with the default storage level (C{MEMORY_ONLY}).

collect_metadata(extent=None, layout=None, crs=None, tile_size=256)

Iterate over RDD records and generates layer metadata desribing the contained rasters.

Parameters:
  • extent (Extent, optional) – Specify layout extent, must also specify layout.
  • layout (TileLayout, optional) – Specify tile layout, must also specify extent.
  • crs (str or int, optional) – Ignore CRS from records and use given one instead.
  • tile_size (int, optional) – Pixel dimensions of each tile, if not using layout.

Note

extent and layout must both be defined if they are to be used.

Returns:Metadata
Raises:TypeError – If either extent and layout is not defined but the other is.
convert_data_type(new_type)

Converts the underlying, raster values to a new CellType.

Parameters:new_type (str) – The string representation of the CellType to convert to. It is represented by a constant such as INT16, FLOAT64UD, etc.
Returns:RasterRDD
Raises:ValueError – When an unsupported cell type is entered.
cut_tiles(layer_metadata, resample_method='NearestNeighbor')

Cut tiles to layout. May result in duplicate keys.

Parameters:
  • layer_metadata (Metadata) – The Metadata of the RasterRDD instance.
  • resample_method (str, optional) – The resample method to use for the reprojection. This is represented by the following constants: NEARESTNEIGHBOR, BILINEAR, CUBICCONVOLUTION, LANCZOS, AVERAGE, MODE, MEDIAN, MAX, and MIN. If none is specified, then NEARESTNEIGHBOR is used.
Returns:

TiledRasterRDD

classmethod from_numpy_rdd(geopysc, rdd_type, numpy_rdd)

Create a RasterRDD from a numpy RDD.

Parameters:
  • geopysc (GeoPyContext) – The GeoPyContext being used this session.
  • rdd_type (str) – What the spatial type of the geotiffs are. This is represented by the constants: SPATIAL and SPACETIME.
  • numpy_rdd (pyspark.RDD) – A PySpark RDD that contains tuples of either ProjectedExtents or TemporalProjectedExtents and rasters that are represented by a numpy array.
Returns:

RasterRDD

get_min_max()

Returns the maximum and minimum values of all of the rasters in the RDD.

Returns:(float, float)
persist(storageLevel=StorageLevel(False, True, False, False, 1))

Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).

reclassify(value_map, data_type, boundary_strategy='LessThanOrEqualTo', replace_nodata_with=None)

Changes the cell values of a raster based on how the data is broken up.

Parameters:
  • value_map (dict) – A dict whose keys represent values where a break should occur and its values are the new value the cells within the break should become.
  • data_type (type) – The type of the values within the rasters. Can either be int or float.
  • boundary_strategy (str, optional) – How the cells should be classified along the breaks. This is represented by the following constants: GREATERTHAN, GREATERTHANOREQUALTO, LESSTHAN, LESSTHANOREQUALTO, and EXACT. If unspecified, then LESSTHANOREQUALTO will be used.
  • replace_nodata_with (data_type, optional) – When remapping values, nodata values must be treated separately. If nodata values are intended to be replaced during the reclassify, this variable should be set to the intended value. If unspecified, nodata values will be preserved.

Note

NoData symbolizes a different value depending on if data_type is int or float. For int, the constant NODATAINT can be used which represents the NoData value for int in GeoTrellis. For float, float('nan') is used to represent NoData.

Returns:RasterRDD
reproject(target_crs, resample_method='NearestNeighbor')

Reproject every individual raster to target_crs, does not sample past tile boundary

Parameters:
  • target_crs (str or int) – The CRS to reproject to. Can either be the EPSG code, well-known name, or a PROJ.4 projection string.
  • resample_method (str, optional) – The resample method to use for the reprojection. This is represented by the following constants: NEARESTNEIGHBOR, BILINEAR, CUBICCONVOLUTION, LANCZOS, AVERAGE, MODE, MEDIAN, MAX, and MIN. If none is specified, then NEARESTNEIGHBOR is used.
Returns:

RasterRDD

tile_to_layout(layer_metadata, resample_method='NearestNeighbor')

Cut tiles to layout and merge overlapping tiles. This will produce unique keys.

Parameters:
  • layer_metadata (Metadata) – The Metadata of the RasterRDD instance.
  • resample_method (str, optional) – The resample method to use for the reprojection. This is represented by the following constants: NEARESTNEIGHBOR, BILINEAR, CUBICCONVOLUTION, LANCZOS, AVERAGE, MODE, MEDIAN, MAX, and MIN. If none is specified, then NEARESTNEIGHBOR is used.
Returns:

TiledRasterRDD

to_numpy_rdd()

Converts a RasterRDD to a numpy RDD.

Note

Depending on the size of the data stored within the RDD, this can be an exspensive operation and should be used with caution.

Returns:pyspark.RDD
to_tiled_layer(extent=None, layout=None, crs=None, tile_size=256, resample_method='NearestNeighbor')

Converts this RasterRDD to a TiledRasterRDD.

This method combines collect_metadata() and tile_to_layout() into one step.

Parameters:
  • extent (Extent, optional) – Specify layout extent, must also specify layout.
  • layout (TileLayout, optional) – Specify tile layout, must also specify extent.
  • crs (str or int, optional) – Ignore CRS from records and use given one instead.
  • tile_size (int, optional) – Pixel dimensions of each tile, if not using layout.
  • resample_method (str, optional) – The resample method to use for the reprojection. This is represented by the following constants: NEARESTNEIGHBOR, BILINEAR, CUBICCONVOLUTION, LANCZOS, AVERAGE, MODE, MEDIAN, MAX, and MIN. If none is specified, then NEARESTNEIGHBOR is used.

Note

extent and layout must both be defined if they are to be used.

Returns:TiledRasterRDD
unpersist()

Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.

wrapped_rdds()

Returns the list of RDD-containing objects wrapped by this object. The default implementation assumes that subclass contains a single RDD container, srdd, which implements the persist() and unpersist() methods.

class geopyspark.geotrellis.rdd.TiledRasterRDD(geopysc, rdd_type, srdd)

Wraps a RDD of tiled, GeoTrellis rasters.

Represents a RDD that contains (K, V). Where K is either SpatialKey or SpaceTimeKey depending on the rdd_type of the RDD, and V being a Raster.

The data held within the RDD is tiled. This means that the rasters have been modified to fit a larger layout. For more information, see TiledRasterRDD.

Parameters:
  • geopysc (GeoPyContext) – The GeoPyContext being used this session.
  • rdd_type (str) – What the spatial type of the geotiffs are. This is represented by the constants: SPATIAL and SPACETIME.
  • srdd (py4j.java_gateway.JavaObject) – The coresponding Scala class. This is what allows TiledRasterRDD to access the various Scala methods.
geopysc

GeoPyContext – The GeoPyContext being used this session.

rdd_type

str – What the spatial type of the geotiffs are. This is represented by the constants: SPATIAL` and ``SPACETIME.

srdd

py4j.java_gateway.JavaObject – The coresponding Scala class. This is what allows RasterRDD to access the various Scala methods.

cache()

Persist this RDD with the default storage level (C{MEMORY_ONLY}).

convert_data_type(new_type)

Converts the underlying, raster values to a new CellType.

Parameters:new_type (str) – The string representation of the CellType to convert to. It is represented by a constant such as INT16, FLOAT64UD, etc.
Returns:TiledRasterRDD
cost_distance(geometries, max_distance)

Performs cost distance of a TileLayer.

Parameters:
  • geometries (list) –

    A list of shapely geometries to be used as a starting point.

    Note

    All geometries must be in the same CRS as the TileLayer.

  • max_distance (int, float) – The maximum cost that a path may reach before the operation. stops. This value can be an int or float.
Returns:

TiledRasterRDD

classmethod euclidean_distance(geopysc, geometry, source_crs, zoom, cellType='float64')

Calculates the Euclidean distance of a Shapely geometry.

Parameters:
  • geopysc (GeoPyContext) – The GeoPyContext being used this session.
  • geometry (shapely.geometry) – The input geometry to compute the Euclidean distance for.
  • source_crs (str or int) – The CRS of the input geometry.
  • zoom (int) – The zoom level of the output raster.

Note

This function may run very slowly for polygonal inputs if they cover many cells of the output raster.

Returns:RDD
focal(operation, neighborhood=None, param_1=None, param_2=None, param_3=None)

Performs the given focal operation on the layers contained in the RDD.

Parameters:
  • operation (str) – The focal operation. Represented by constants: SUM, MIN, MAX, MEAN, MEDIAN, MODE, STANDARDDEVIATION, ASPECT, and SLOPE.
  • neighborhood (str or Neighborhood, optional) – The type of neighborhood to use in the focal operation. This can be represented by either an instance of Neighborhood, or by the constants: ANNULUS, NEWS, SQUARE, WEDGE, and CIRCLE. Defaults to None.
  • param_1 (int or float, optional) – If using SLOPE, then this is the zFactor, else it is the first argument of neighborhood.
  • param_2 (int or float, optional) – The second argument of the neighborhood.
  • param_3 (int or float, optional) – The third argument of the neighborhood.

Note

param only need to be set if neighborhood is not an instance of Neighborhood or if neighborhood is None.

Any param that is not set will default to 0.0.

If neighborhood is None then operation must be either SLOPE or ASPECT.

Returns:

TiledRasterRDD

Raises:
  • ValueError – If operation is not a known operation.
  • ValueError – If neighborhood is not a known neighborhood.
  • ValueError – If neighborhood was not set, and operation is not SLOPE or ASPECT.
classmethod from_numpy_rdd(geopysc, rdd_type, numpy_rdd, metadata)

Create a TiledRasterRDD from a numpy RDD.

Parameters:
  • geopysc (GeoPyContext) – The GeoPyContext being used this session.
  • rdd_type (str) – What the spatial type of the geotiffs are. This is represented by the constants: SPATIAL and SPACETIME.
  • numpy_rdd (pyspark.RDD) – A PySpark RDD that contains tuples of either SpatialKey or SpaceTimeKey and rasters that are represented by a numpy array.
  • metadata (Metadata) – The Metadata of the TiledRasterRDD instance.
Returns:

TiledRasterRDD

get_histogram()

Returns an array of Java histogram objects, one for each band of the raster.

Parameters:None
Returns:An array of Java objects containing the histograms of each band
get_min_max()

Returns the maximum and minimum values of all of the rasters in the RDD.

Returns:(float, float)
get_quantile_breaks(num_breaks)

Returns quantile breaks for this RDD.

Parameters:num_breaks (int) – The number of breaks to return.
Returns:[float]
get_quantile_breaks_exact_int(num_breaks)

Returns quantile breaks for this RDD. This version uses the FastMapHistogram, which counts exact integer values. If your RDD has too many values, this can cause memory errors.

Parameters:num_breaks (int) – The number of breaks to return.
Returns:[int]
is_floating_point_layer()

Determines whether the content of the TiledRasterRDD is of floating point type.

Parameters:None
Returns:[boolean]
layer_metadata

Layer metadata associated with this layer.

lookup(col, row)

Return the value(s) in the image of a particular SpatialKey (given by col and row).

Parameters:
  • col (int) – The SpatialKey column.
  • row (int) – The SpatialKey row.
Returns:

A list of numpy arrays (the tiles)

Raises:
  • ValueError – If using lookup on a non SPATIAL TiledRasterRDD.
  • IndexError – If col and row are not within the TiledRasterRDD’s bounds.
mask(geometries)

Masks the TiledRasterRDD so that only values that intersect the geometries will be available.

Parameters:geometries (list) –

A list of shapely geometries to use as masks.

Note

All geometries must be in the same CRS as the TileLayer.

Returns:TiledRasterRDD
persist(storageLevel=StorageLevel(False, True, False, False, 1))

Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).

polygonal_max(geometry, data_type)

Finds the max value that is contained within the given geometry.

Parameters:
  • geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or str) – A Shapely Polygon or MultiPolygon that represents the area where the summary should be computed; or a WKT string representation of the geometry.
  • data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns:

int or float depending on data_type.

Raises:

TypeError – If data_type is not an int or float.

polygonal_mean(geometry)

Finds the mean of all of the values that are contained within the given geometry.

Parameters:geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or str) – A Shapely Polygon or MultiPolygon that represents the area where the summary should be computed; or a WKT string representation of the geometry.
Returns:float
polygonal_min(geometry, data_type)

Finds the min value that is contained within the given geometry.

Parameters:
  • geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or str) – A Shapely Polygon or MultiPolygon that represents the area where the summary should be computed; or a WKT string representation of the geometry.
  • data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns:

int or float depending on data_type.

Raises:

TypeError – If data_type is not an int or float.

polygonal_sum(geometry, data_type)

Finds the sum of all of the values that are contained within the given geometry.

Parameters:
  • geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or str) – A Shapely Polygon or MultiPolygon that represents the area where the summary should be computed; or a WKT string representation of the geometry.
  • data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns:

int or float depending on data_type.

Raises:

TypeError – If data_type is not an int or float.

pyramid(start_zoom, end_zoom, resample_method='NearestNeighbor')

Creates a pyramid of GeoTrellis layers where each layer reprsents a given zoom.

Parameters:
  • start_zoom (int) – The zoom level where pyramiding should begin. Represents the level that is most zoomed in.
  • end_zoom (int) – The zoom level where pyramiding should end. Represents the level that is most zoomed out.
  • resample_method (str, optional) – The resample method to use for the reprojection. This is represented by the following constants: NEARESTNEIGHBOR, BILINEAR, CUBICCONVOLUTION, LANCZOS, AVERAGE, MODE, MEDIAN, MAX, and MIN. If none is specified, then NEARESTNEIGHBOR is used.
Returns:

[TiledRasterRDDs].

Raises:
  • ValueError – If the given resample_method is not known.
  • ValueError – If the col and row count is not a power of 2.
classmethod rasterize(geopysc, rdd_type, geometry, extent, crs, cols, rows, fill_value, instant=None)

Creates a TiledRasterRDD from a shapely geomety.

Parameters:
  • geopysc (GeoPyContext) – The GeoPyContext being used this session.
  • rdd_type (str) – What the spatial type of the geotiffs are. This is represented by the constants: SPATIAL and SPACETIME.
  • geometry (str or shapely.geometry.Polygon) – The value to be turned into a raster. Can either be a string or a Polygon. If the value is a string, it must be the WKT string, geometry format.
  • extent (Extent) – The extent of the new raster.
  • crs (str or int) – The CRS the new raster should be in.
  • cols (int) – The number of cols the new raster should have.
  • rows (int) – The number of rows the new raster should have.
  • fill_value (int) –

    The value to fill the raster with.

    Note

    Only the area the raster intersects with the extent will have this value. Any other area will be filled with GeoTrellis’ NoData value for int which is represented in GeoPySpark as the constant, NODATAINT.

  • instant (int, optional) – Optional if the data has no time component (ie is SPATIAL). Otherwise, it is requires and represents the time stamp of the data.
Returns:

TiledRasterRDD

Raises:

TypeError – If geometry is not a str or a Polygon; or if there was a mistach in inputs like setting the rdd_type as SPATIAL but also setting instant.

reclassify(value_map, data_type, boundary_strategy='LessThanOrEqualTo', replace_nodata_with=None)

Changes the cell values of a raster based on how the data is broken up.

Parameters:
  • value_map (dict) – A dict whose keys represent values where a break should occur and its values are the new value the cells within the break should become.
  • data_type (type) – The type of the values within the rasters. Can either be int or float.
  • boundary_strategy (str, optional) – How the cells should be classified along the breaks. This is represented by the following constants: GREATERTHAN, GREATERTHANOREQUALTO, LESSTHAN, LESSTHANOREQUALTO, and EXACT. If unspecified, then LESSTHANOREQUALTO will be used.
  • replace_nodata_with (data_type, optional) – When remapping values, nodata values must be treated separately. If nodata values are intended to be replaced during the reclassify, this variable should be set to the intended value. If unspecified, nodata values will be preserved.

Note

NoData symbolizes a different value depending on if data_type is int or float. For int, the constant NODATAINT can be used which represents the NoData value for int in GeoTrellis. For float, float('nan') is used to represent NoData.

Returns:TiledRasterRDD
reproject(target_crs, extent=None, layout=None, scheme='float', tile_size=256, resolution_threshold=0.1, resample_method='NearestNeighbor')

Reproject RDD as tiled raster layer, samples surrounding tiles.

Parameters:
  • target_crs (str or int) – The CRS to reproject to. Can either be the EPSG code, well-known name, or a PROJ.4 projection string.
  • extent (Extent, optional) – Specify the layout extent, must also specify layout.
  • layout (TileLayout, optional) – Specify the tile layout, must also specify extent.
  • scheme (str, optional) – Which LayoutScheme should be used. Represented by the constants: FLOAT and ZOOM. If not specified, then FLOAT is used.
  • tile_size (int, optional) – Pixel dimensions of each tile, if not using layout.
  • resolution_threshold (double, optional) – The percent difference between a cell size and a zoom level along with the resolution difference between the zoom level and the next one that is tolerated to snap to the lower-resolution zoom.
  • resample_method (str, optional) – The resample method to use for the reprojection. This is represented by the following constants: NEARESTNEIGHBOR, BILINEAR, CUBICCONVOLUTION, LANCZOS, AVERAGE, MODE, MEDIAN, MAX, and MIN. If none is specified, then NEARESTNEIGHBOR is used.

Note

extent and layout must both be defined if they are to be used.

Returns:TiledRasterRDD
Raises:TypeError – If either extent or layout is defined but the other is not.
stitch()

Stitch all of the rasters within the RDD into one raster.

Note

This can only be used on SPATIAL TiledRasterRDDs.

Returns:Raster
tile_to_layout(layout, resample_method='NearestNeighbor')

Cut tiles to a given layout and merge overlapping tiles. This will produce unique keys.

Parameters:
  • layout (TileLayout) – Specify the TileLayout to cut to.
  • resample_method (str, optional) – The resample method to use for the reprojection. This is represented by the following constants: NEARESTNEIGHBOR, BILINEAR, CUBICCONVOLUTION, LANCZOS, AVERAGE, MODE, MEDIAN, MAX, and MIN. If none is specified, then NEARESTNEIGHBOR is used.
Returns:

TiledRasterRDD

to_numpy_rdd()

Converts a TiledRasterRDD to a numpy RDD.

Note

Depending on the size of the data stored within the RDD, this can be an exspensive operation and should be used with caution.

Returns:pyspark.RDD
unpersist()

Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.

wrapped_rdds()

Returns the list of RDD-containing objects wrapped by this object. The default implementation assumes that subclass contains a single RDD container, srdd, which implements the persist() and unpersist() methods.

zoom_level

The zoom level of the RDD. Can be None.