geopyspark package

geopyspark.geopyspark_conf(master=None, appName=None, additional_jar_dirs=[])

Construct the base SparkConf for use with GeoPySpark. This configuration object may be used as is , or may be adjusted according to the user’s needs.

Note

The GEOPYSPARK_JARS_PATH environment variable may contain a colon-separated list of directories to search for JAR files to make available via the SparkConf.

Parameters:
  • master (string) – The master URL to connect to, such as “local” to run locally with one thread, “local[4]” to run locally with 4 cores, or “spark://master:7077” to run on a Spark standalone cluster.
  • appName (string) – The name of the application, as seen in the Spark console
  • additional_jar_dirs (list, optional) – A list of directory locations that might contain JAR files needed by the current script. Already includes $(pwd)/jars.
Returns:

SparkConf

class geopyspark.Tile

Represents a raster in GeoPySpark.

Note

All rasters in GeoPySpark are represented as having multiple bands, even if the original raster just contained one.

Parameters:
  • cells (nd.array) – The raster data itself. It is contained within a NumPy array.
  • data_type (str) – The data type of the values within data if they were in Scala.
  • no_data_value – The value that represents no data value in the raster. This can be represented by a variety of types depending on the value type of the raster.
cells

nd.array – The raster data itself. It is contained within a NumPy array.

data_type

str – The data type of the values within data if they were in Scala.

no_data_value

The value that represents no data value in the raster. This can be represented by a variety of types depending on the value type of the raster.

cell_type

Alias for field number 1

cells

Alias for field number 0

count(value) → integer -- return number of occurrences of value
static dtype_to_cell_type(dtype)

Converts a np.dtype to the corresponding GeoPySpark cell_type.

Note

bool, complex64, complex128, and complex256, are currently not supported np.dtypes.

Parameters:dtype (np.dtype) – The dtype of the numpy array.
Returns:str. The GeoPySpark cell_type equivalent of the dtype.
Raises:TypeError – If the given dtype is not a supported data type.
classmethod from_numpy_array(numpy_array, no_data_value=None)

Creates an instance of Tile from a numpy array.

Parameters:
  • numpy_array (np.array) –

    The numpy array to be used to represent the cell values of the Tile.

    Note

    GeoPySpark does not support arrays with the following data types: bool, complex64, complex128, and complex256.

  • no_data_value (optional) – The value that represents no data value in the raster. This can be represented by a variety of types depending on the value type of the raster. If not given, then the value will be None.
Returns:

Tile

index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

no_data_value

Alias for field number 2

class geopyspark.Extent

The “bounding box” or geographic region of an area on Earth a raster represents.

Parameters:
  • xmin (float) – The minimum x coordinate.
  • ymin (float) – The minimum y coordinate.
  • xmax (float) – The maximum x coordinate.
  • ymax (float) – The maximum y coordinate.
xmin

float – The minimum x coordinate.

ymin

float – The minimum y coordinate.

xmax

float – The maximum x coordinate.

ymax

float – The maximum y coordinate.

count(value) → integer -- return number of occurrences of value
classmethod from_polygon(polygon)

Creates a new instance of Extent from a Shapely Polygon.

The new Extent will contain the min and max coordinates of the Polygon; regardless of the Polygon’s shape.

Parameters:polygon (shapely.geometry.Polygon) – A Shapely Polygon.
Returns:Extent
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

to_polygon

Converts this instance to a Shapely Polygon.

The resulting Polygon will be in the shape of a box.

Returns:shapely.geometry.Polygon
xmax

Alias for field number 2

xmin

Alias for field number 0

ymax

Alias for field number 3

ymin

Alias for field number 1

class geopyspark.ProjectedExtent

Describes both the area on Earth a raster represents in addition to its CRS.

Parameters:
  • extent (Extent) – The area the raster represents.
  • epsg (int, optional) – The EPSG code of the CRS.
  • proj4 (str, optional) – The Proj.4 string representation of the CRS.
extent

Extent – The area the raster represents.

epsg

int, optional – The EPSG code of the CRS.

proj4

str, optional – The Proj.4 string representation of the CRS.

Note

Either epsg or proj4 must be defined.

count(value) → integer -- return number of occurrences of value
epsg

Alias for field number 1

extent

Alias for field number 0

index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

proj4

Alias for field number 2

class geopyspark.TemporalProjectedExtent

Describes the area on Earth the raster represents, its CRS, and the time the data was collected.

Parameters:
  • extent (Extent) – The area the raster represents.
  • instant (datetime.datetime) – The time stamp of the raster.
  • epsg (int, optional) – The EPSG code of the CRS.
  • proj4 (str, optional) – The Proj.4 string representation of the CRS.
extent

Extent – The area the raster represents.

instant

datetime.datetime – The time stamp of the raster.

epsg

int, optional – The EPSG code of the CRS.

proj4

str, optional – The Proj.4 string representation of the CRS.

Note

Either epsg or proj4 must be defined.

count(value) → integer -- return number of occurrences of value
epsg

Alias for field number 2

extent

Alias for field number 0

index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

instant

Alias for field number 1

proj4

Alias for field number 3

class geopyspark.SpatialKey

Represents the position of a raster within a grid. This grid is a 2D plane where raster positions are represented by a pair of coordinates.

Parameters:
  • col (int) – The column of the grid, the numbers run east to west.
  • row (int) – The row of the grid, the numbers run north to south.
col

int – The column of the grid, the numbers run east to west.

row

int – The row of the grid, the numbers run north to south.

col

Alias for field number 0

count(value) → integer -- return number of occurrences of value
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

row

Alias for field number 1

class geopyspark.SpaceTimeKey

Represents the position of a raster within a grid. This grid is a 3D plane where raster positions are represented by a pair of coordinates as well as a z value that represents time.

Parameters:
  • col (int) – The column of the grid, the numbers run east to west.
  • row (int) – The row of the grid, the numbers run north to south.
  • instant (datetime.datetime) – The time stamp of the raster.
col

int – The column of the grid, the numbers run east to west.

row

int – The row of the grid, the numbers run north to south.

instant

datetime.datetime – The time stamp of the raster.

col

Alias for field number 0

count(value) → integer -- return number of occurrences of value
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

instant

Alias for field number 2

row

Alias for field number 1

class geopyspark.Metadata(bounds, crs, cell_type, extent, layout_definition)

Information of the values within a RasterLayer or TiledRasterLayer. This data pertains to the layout and other attributes of the data within the classes.

Parameters:
  • bounds (Bounds) – The Bounds of the values in the class.
  • crs (str or int) – The CRS of the data. Can either be the EPSG code, well-known name, or a PROJ.4 projection string.
  • cell_type (str or CellType) – The data type of the cells of the rasters.
  • extent (Extent) – The Extent that covers the all of the rasters.
  • layout_definition (LayoutDefinition) – The LayoutDefinition of all rasters.
bounds

Bounds – The Bounds of the values in the class.

crs

str or int – The CRS of the data. Can either be the EPSG code, well-known name, or a PROJ.4 projection string.

cell_type

str – The data type of the cells of the rasters.

no_data_value

int or float or None – The noData value of the rasters within the layer. This can either be None, an int, or a float depending on the cell_type.

extent

Extent – The Extent that covers the all of the rasters.

tile_layout

TileLayout – The TileLayout that describes how the rasters are orginized.

layout_definition

LayoutDefinition – The LayoutDefinition of all rasters.

classmethod from_dict(metadata_dict)

Creates Metadata from a dictionary.

Parameters:metadata_dict (dict) – The Metadata of a RasterLayer or TiledRasterLayer instance that is in dict form.
Returns:Metadata
to_dict()

Converts this instance to a dict.

Returns:dict
class geopyspark.TileLayout

Describes the grid in which the rasters within a Layer should be laid out.

Parameters:
  • layoutCols (int) – The number of columns of rasters that runs east to west.
  • layoutRows (int) – The number of rows of rasters that runs north to south.
  • tileCols (int) – The number of columns of pixels in each raster that runs east to west.
  • tileRows (int) – The number of rows of pixels in each raster that runs north to south.
layoutCols

int – The number of columns of rasters that runs east to west.

layoutRows

int – The number of rows of rasters that runs north to south.

tileCols

int – The number of columns of pixels in each raster that runs east to west.

tileRows

int – The number of rows of pixels in each raster that runs north to south.

count(value) → integer -- return number of occurrences of value
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

layoutCols

Alias for field number 0

layoutRows

Alias for field number 1

tileCols

Alias for field number 2

tileRows

Alias for field number 3

class geopyspark.GlobalLayout

TileLayout type that spans global CRS extent.

When passed in place of LayoutDefinition it signifies that a LayoutDefinition instance should be constructed such that it fits the global CRS extent. The cell resolution of resulting layout will be one of resolutions implied by power of 2 pyramid for that CRS. Tiling to this layout will likely result in either up-sampling or down-sampling the source raster.

Parameters:
  • tile_size (int) – The number of columns and row pixels in each tile.
  • zoom (int, optional) – Override the zoom level in power of 2 pyramid.
  • threshold (float, optional) – The percentage difference between a cell size and a zoom level and the resolution difference between that zoom level and the next that is tolerated to snap to the lower-resolution zoom level. For example, if this paramter is 0.1, that means we’re willing to downsample rasters with a higher resolution in order to fit them to some zoom level Z, if the difference is resolution is less than or equal to 10% the difference between the resolutions of zoom level Z and zoom level Z+1.
tile_size

int – The number of columns and row pixels in each tile.

zoom

int – The desired zoom level of the layout.

threshold

float, optional – The percentage difference between a cell size and a zoom level and the resolution difference between that zoom level and the next that is tolerated to snap to the lower-resolution zoom level.

count(value) → integer -- return number of occurrences of value
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

threshold

Alias for field number 2

tile_size

Alias for field number 0

zoom

Alias for field number 1

class geopyspark.LocalLayout

TileLayout type that snaps the layer extent.

When passed in place of LayoutDefinition it signifies that a LayoutDefinition instances should be constructed over the envelope of the layer pixels with given tile size. Resulting TileLayout will match the cell resolution of the source rasters.

Parameters:
  • tile_size (int, optional) – The number of columns and row pixels in each tile. If this is None, then the sizes of each tile will be set using tile_cols and tile_rows.
  • tile_cols (int, optional) – The number of column pixels in each tile. This supersedes tile_size. Meaning if this and tile_size are set, then this will be used for the number of colunn pixles. If None, then the number of column pixels will default to 256.
  • tile_rows (int, optional) – The number of rows pixels in each tile. This supersedes tile_size. Meaning if this and tile_size are set, then this will be used for the number of row pixles. If None, then the number of row pixels will default to 256.
tile_cols

int – The number of column pixels in each tile

tile_rows

int – The number of rows pixels in each tile. This supersedes

count(value) → integer -- return number of occurrences of value
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

tile_cols

Alias for field number 0

tile_rows

Alias for field number 1

class geopyspark.LayoutDefinition

Describes the layout of the rasters within a Layer and how they are projected.

Parameters:
  • extent (Extent) – The Extent of the layout.
  • tileLayout (TileLayout) – The TileLayout of how the rasters within the Layer.
extent

Extent – The Extent of the layout.

tileLayout

TileLayout – The TileLayout of how the rasters within the Layer.

count(value) → integer -- return number of occurrences of value
extent

Alias for field number 0

index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

tileLayout

Alias for field number 1

class geopyspark.Bounds

Represents the grid that covers the area of the rasters in a Layer on a grid.

Parameters:
  • minKey (SpatialKey or SpaceTimeKey) – The smallest SpatialKey or SpaceTimeKey.
  • minKey – The largest SpatialKey or SpaceTimeKey.
minKey

SpatialKey or SpaceTimeKey – The smallest SpatialKey or SpaceTimeKey.

minKey

SpatialKey or SpaceTimeKey – The largest SpatialKey or SpaceTimeKey.

count(value) → integer -- return number of occurrences of value
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

maxKey

Alias for field number 1

minKey

Alias for field number 0

class geopyspark.RasterizerOptions

Represents options available to geometry rasterizer

Parameters:
  • includePartial (bool, optional) – Include partial pixel intersection (default: True)
  • sampleType (str, optional) – ‘PixelIsArea’ or ‘PixelIsPoint’ (default: ‘PixelIsPoint’)
includePartial

bool – Include partial pixel intersection.

sampleType

str – How the sampling should be performed during rasterization.

count(value) → integer -- return number of occurrences of value
includePartial

Alias for field number 0

index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

sampleType

Alias for field number 1

geopyspark.zfactor_lat_lng_calculator(unit)

Produces the Scala class, ZFactorCalculator as a JavaObject.

The resulting ZFactorCalculator produced using this method assumes that the Tiles it will be deriving zfactors from are in LatLng (aka epsg:4326). This caculator can still be used on Tiles with different projections, however, the resulting Slope calculations may be off.

Parameters:units (str or Unit) – The unit of elevation in the target layer.
Returns:py4j.JavaObject
geopyspark.zfactor_calculator(mapped_zfactors)

Produces the Scala class, ZFactorCalculator as a JavaObject.

Unlike the ZFactorCalculator produced in zfactor_lat_lng_calculator(), this resulting ZFactorCalculator can used on Tiles in a different projection. However, it cannot be used between different types of projections. For example, a ZFactorCalculator produced for a Layer that is in WebMercator will not create an accurate ZFactor for a Layer that is in LatLng.

Parameters:mapped_zfactors (dict) – A dict that maps lattitudes to ZFactors. It is not required to supply a mapping for ever lattitude intersected in the layer. Rather, based on the lattitudes given, a linear interpolation will be performed and any lattitude not mapped will have its ZFactor derived from that interpolation.
Returns:py4j.JavaObject
class geopyspark.HashPartitionStrategy

Represents a partitioning strategy for a layer that uses Spark’s HashPartitioner with a set number of partitions.

Parameters:num_partitions (int, optional) – The number of partitions that should be used during partitioning. Default is, None. If None the resulting layer will have a HashPartitioner with the number of partitions being either the same as the input layer’s, or a number computed by the method.
count(value) → integer -- return number of occurrences of value
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

num_partitions

Alias for field number 0

class geopyspark.SpatialPartitionStrategy

Represents a partitioning strategy for a layer that uses GeoPySpark’s SpatialPartitioner with a set number of partitions.

This partitioner will try and group Tiles together that are spatially near each other in the same partition. In order to do this, each Tile has their Key Index calculated using the space filling curve index, Z-Curve.

Parameters:
  • num_partitions (int, optional) – The number of partitions that should be used during partitioning. Default is, None. If None the resulting layer will have a HashPartitioner with the number of partitions being either the same as the input layer’s, or a number computed by the method.
  • bits (int, optional) –

    Helps determine how much data should be placed in each partition. Default is, 8.

    GeoPySpark uses a Z-order curve to determine how values within the layer should be grouped. This is done by first finding the Key Index of a value and then performing a bitwise right shift on the resulting index. From the remaining bits, a partition is selected such that those indexes with the same remaining bits will be in the same partition. Therefore, as the number of bits shifted to the right increases, so then too does the group sizes.

num_partitions

int – The number of partitions that should be used during partitioning.

bits

int – Determine how much data should be placed in each partition.

bits

Alias for field number 1

count(value) → integer -- return number of occurrences of value
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

num_partitions

Alias for field number 0

class geopyspark.SpaceTimePartitionStrategy

Represents a partitioning strategy for a layer that uses GeoPySpark’s SpaceTimePartitioner with a set number of partitions, units of time, and temporal resolution.

This partitioner will try and group Tiles together that are spatially and temproally near each other in the same partition. In order to do this, each Tile has their Key Index calculated using the space filling curve index, Z-Curve.

Note

This partitiong strategy will only work on SPACETIME layers, and will fail if given a SPATIAL one. For SPATIAL layers, please see SpatialPartitionStrategy.

Parameters:
  • time_unit (str or TimeUnit) – Which time unit should be used when saving spatial-temporal data. This controls the resolution of each index. Meaning, what time intervals are used to seperate each record.
  • num_partitions (int, optional) – The number of partitions that should be used during partitioning. Default is, None. If None the resulting layer will have a HashPartitioner with the number of partitions being either the same as the input layer’s, or a number computed by the method.
  • bits (int, optional) –

    Helps determine how much data should be placed in each partition. Default is, 8.

    GeoPySpark uses a Z-order curve to determine how values within the layer should be grouped. This is done by first finding the Key Index of a value and then performing a bitwise right shift on the resulting index. From the remaining bits, a partition is selected such that those indexes with the same remaining bits will be in the same partition. Therefore, as the number of bits shifted to the right increases, so then too does the group sizes.

  • time_resolution (str or int, optional) –

    Determines how data for each time_unit should be grouped together. By default, no grouping will occur.

    As an example, having a time_unit of WEEKS and a time_resolution of 5 will cause the data to be grouped and stored together in units of 5 weeks. If however time_resolution is not specified, then the data will be grouped and stored in units of single weeks.

    This value can either be an int or a string representation of an int.

time_unit

str or TimeUnit – Which time unit should be used when saving spatial-temporal data.

num_partitions

int – The number of partitions that should be used during partitioning.

bits

int – Helps determine how much data should be placed in each partition.

time_resolution

str or int – Determines how data for each time_unit should be grouped together.

bits

Alias for field number 2

count(value) → integer -- return number of occurrences of value
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

num_partitions

Alias for field number 1

time_resolution

Alias for field number 3

time_unit

Alias for field number 0

class geopyspark.Feature

Represents a geometry that is derived from an OSM Element with that Element’s associated metadata.

Parameters:
  • geometry (shapely.geometry) – The geometry of the feature.
  • properties (CellValue) – The metadata associated with the paired geometry.
geometry

shapely.geometry – The geometry of the feature.

properties

CellValue – The metadata associated with the paired geometry.

count(value) → integer -- return number of occurrences of value
geometry

Alias for field number 0

index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

properties

Alias for field number 1

class geopyspark.CellValue

Represents the value and zindex of a geometry.

This object is one of two types that can be used to represent the properties of a Feature.

Parameters:
  • value (int or float) – The value of all cells that intersects the associated geometry.
  • zindex (int) – The Z-Index of each cell that intersects the associated geometry. Z-Index determines which value a cell should be if multiple geometries intersect it. A high Z-Index will always be in front of a Z-Index of a lower value.
value

int or float – The value of all cells that intersects the associated geometry.

zindex

int – The Z-Index of each cell that intersects the associated geometry. Z-Index determines which value a cell should be if multiple geometries intersect it. A high Z-Index will always be in front of a Z-Index of a lower value.

count(value) → integer -- return number of occurrences of value
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

value

Alias for field number 0

zindex

Alias for field number 1

geopyspark.read_layer_metadata(uri, layer_name, layer_zoom)

Reads the metadata from a saved layer without reading in the whole layer.

Parameters:
  • uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
  • layer_name (str) – The name of the GeoTrellis catalog to be read from.
  • layer_zoom (int) – The zoom level of the layer that is to be read.
Returns:

Metadata

geopyspark.read_value(uri, layer_name, layer_zoom, col, row, zdt=None)

Reads a single Tile from a GeoTrellis catalog. Unlike other functions in this module, this will not return a TiledRasterLayer, but rather a GeoPySpark formatted raster.

Note

When requesting a tile that does not exist, None will be returned.

Parameters:
  • uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
  • layer_name (str) – The name of the GeoTrellis catalog to be read from.
  • layer_zoom (int) – The zoom level of the layer that is to be read.
  • col (int) – The col number of the tile within the layout. Cols run east to west.
  • row (int) – The row number of the tile within the layout. Row run north to south.
  • zdt (datetime.datetime) – The time stamp of the tile if the data is spatial-temporal. This is represented as a datetime.datetime. instance. The default value is, None. If None, then only the spatial area will be queried.
Returns:

Tile

geopyspark.query(uri, layer_name, layer_zoom=None, query_geom=None, time_intervals=None, query_proj=None, num_partitions=None)

Queries a single, zoom layer from a GeoTrellis catalog given spatial and/or time parameters.

Note

The whole layer could still be read in if intersects and/or time_intervals have not been set, or if the querried region contains the entire layer.

Parameters:
  • layer_type (str or LayerType) – What the layer type of the geotiffs are. This is represented by either constants within LayerType or by a string.
  • uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
  • layer_name (str) – The name of the GeoTrellis catalog to be querried.
  • layer_zoom (int, optional) – The zoom level of the layer that is to be querried. If None, then the layer_zoom will be set to 0.
  • query_geom (bytes or shapely.geometry or Extent, Optional) –

    The desired spatial area to be returned. Can either be a string, a shapely geometry, or instance of Extent, or a WKB verson of the geometry.

    Note

    Not all shapely geometires are supported. The following is are the types that are supported: * Point * Polygon * MultiPolygon

    Note

    Only layers that were made from spatial, singleband GeoTiffs can query a Point. All other types are restricted to Polygon and MulitPolygon.

    Note

    If the queried region does not intersect the layer, then an empty layer will be returned.

    If not specified, then the entire layer will be read.

  • time_intervals ([datetime.datetime], optional) – A list of the time intervals to query. This parameter is only used when querying spatial-temporal data. The default value is, None. If None, then only the spatial area will be querried.
  • query_proj (int or str, optional) – The crs of the querried geometry if it is different than the layer it is being filtered against. If they are different and this is not set, then the returned TiledRasterLayer could contain incorrect values. If None, then the geometry and layer are assumed to be in the same projection.
  • num_partitions (int, optional) – Sets RDD partition count when reading from catalog.
Returns:

TiledRasterLayer

geopyspark.write(uri, layer_name, tiled_raster_layer, index_strategy=<IndexingMethod.ZORDER: 'zorder'>, time_unit=None, time_resolution=None, store=None, use_cogs=False)

Writes a tile layer to a specified destination.

Parameters:
  • uri (str) – The Uniform Resource Identifier used to point towards the desired location for the tile layer to written to. The shape of this string varies depending on backend.
  • layer_name (str) – The name of the new, tile layer.
  • tiled_raster_layer (TiledRasterLayer) – The TiledRasterLayer to be saved.
  • index_strategy (str or IndexingMethod, optional) – The method used to orginize the saved data. Depending on the type of data within the layer, only certain methods are available. Can either be a string or a IndexingMethod attribute. The default method used is, IndexingMethod.ZORDER.
  • time_unit (str or TimeUnit, optional) – Which time unit should be used when saving spatial-temporal data. This controls the resolution of each index. Meaning, what time intervals are used to seperate each record. While this is set to None as default, it must be set if saving spatial-temporal data. Depending on the indexing method chosen, different time units are used.
  • time_resolution (str or int, optional) –

    Determines how data for each time_unit should be grouped together. By default, no grouping will occur.

    As an example, having a time_unit of WEEKS and a time_resolution of 5 will cause the data to be grouped and stored together in units of 5 weeks. If however time_resolution is not specified, then the data will be grouped and stored in units of single weeks.

    This value can either be an int or a string representation of an int.

  • store (str or AttributeStore, optional) – AttributeStore instance or URI for layer metadata lookup.
  • use_cogs (bool, optional) –

    Should the layer be written as a GeoTrellis Avro or COG layer. By default, an Avro layer will be written.

    Note

    While a GeoTrellis COG layer will be saved as a series of COGs, they still have an associated file structure and metadata that must be preserved in order to access a given layer.

geopyspark.update_layer(uri, layer_name, tiled_raster_layer, store=None)

Updates a pre-existing layer with a new one by merging the values of the two layers together.

Note: This function will throw an error if one of the following conditions are met:
  • The specified layer does not exist
  • The two layers have differnt types (cell type, layer type, etc.)
  • The two layers Bounds do not intersect
Parameters:
  • uri (str) – The Uniform Resource Identifier used to point towards the desired location for the tile layer to written to. The shape of this string varies depending on backend.
  • layer_name (str) – The name of the new, tile layer.
  • tiled_raster_layer (TiledRasterLayer) – The TiledRasterLayer to be saved.
  • store (str or AttributeStore, optional) – AttributeStore instance or URI for layer metadata lookup.
class geopyspark.AttributeStore(uri)

AttributeStore provides a way to read and write GeoTrellis layer attributes.

Internally all attribute values are stored as JSON, here they are exposed as dictionaries. Classes often stored have a .from_dict and .to_dict methods to bridge the gap:

import geopyspark as gps
store = gps.AttributeStore("s3://azavea-datahub/catalog")
hist = store.layer("us-nlcd2011-30m-epsg3857", zoom=7).read("histogram")
hist = gps.Histogram.from_dict(hist)
class Attributes(store, layer_name, layer_zoom)

Accessor class for all attributes for a given layer

delete(name)

Delete attribute by name

Parameters:name (str) – Attribute name
layer_metadata()
read(name)

Read layer attribute by name as a dict

Parameters:name (str) –
Returns:Attribute value
Return type:dict
write(name, value)

Write layer attribute value as a dict

Parameters:
  • name (str) – Attribute name
  • value (dict) – Attribute value
classmethod build(store)

Builds AttributeStore from URI or passes an instance through.

Parameters:uri (str or AttributeStore) – URI for AttributeStore object or instance.
Returns:AttributeStore
classmethod cached(uri)

Returns cached version of AttributeStore for URI or creates one

contains(name, zoom=None)

Checks if this store contains a layer metadata.

Parameters:
  • name (str) – Layer name
  • zoom (int, optional) – Layer zoom
Returns:

bool

delete(name, zoom=None)

Delete layer and all its attributes

Parameters:
  • name (str) – Layer name
  • zoom (int, optional) – Layer zoom
layer(name, zoom=None)

Layer Attributes object for given layer :param name: Layer name :type name: str :param zoom: Layer zoom :type zoom: int, optional

Returns:Attributes
layers()

List all layers Attributes objects

Returns:[:class:`~geopyspark.geotrellis.catalog.AttributeStore.Attributes`]
geopyspark.get_colors_from_colors(colors)

Returns a list of integer colors from a list of Color objects from the colortools package.

Parameters:colors ([colortools.Color]) – A list of color stops using colortools.Color
Returns:[int]
geopyspark.get_colors_from_matplotlib(ramp_name, num_colors=256)

Returns a list of color breaks from the color ramps defined by Matplotlib.

Parameters:
  • ramp_name (str) – The name of a matplotlib color ramp. See the matplotlib documentation for a list of names and details on each color ramp.
  • num_colors (int, optional) – The number of color breaks to derive from the named map.
Returns:

[int]

class geopyspark.ColorMap(cmap)

A class that wraps a GeoTrellis ColorMap class.

Parameters:cmap (py4j.java_gateway.JavaObject) – The JavaObject that represents the GeoTrellis ColorMap.
cmap

py4j.java_gateway.JavaObject – The JavaObject that represents the GeoTrellis ColorMap.

classmethod build(breaks, colors=None, no_data_color=0, fallback=0, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>)

Given breaks and colors, build a ColorMap object.

Parameters:
  • breaks (dict or list or np.ndarray or Histogram) – If a dict then a mapping from tile values to colors, the latter represented as integers e.g., 0xff000080 is red at half opacity. If a list then tile values that specify breaks in the color mapping. If a Histogram then a histogram from which breaks can be derived.
  • colors (str or list, optional) – If a str then the name of a matplotlib color ramp. If a list then either a list of colortools Color objects or a list of integers containing packed RGBA values. If None, then the ColorMap will be created from the breaks given.
  • no_data_color (int, optional) – A color to replace NODATA values with
  • fallback (int, optional) – A color to replace cells that have no value in the mapping
  • classification_strategy (str or ClassificationStrategy, optional) – A string giving the strategy for converting tile values to colors. e.g., if ClassificationStrategy.LESS_THAN_OR_EQUAL_TO is specified, and the break map is {3: 0xff0000ff, 4: 0x00ff00ff}, then values up to 3 map to red, values from above 3 and up to and including 4 become green, and values over 4 become the fallback color.
Returns:

ColorMap

classmethod from_break_map(break_map, no_data_color=0, fallback=0, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>)

Converts a dictionary mapping from tile values to colors to a ColorMap.

Parameters:
  • break_map (dict) – A mapping from tile values to colors, the latter represented as integers e.g., 0xff000080 is red at half opacity.
  • no_data_color (int, optional) – A color to replace NODATA values with
  • fallback (int, optional) – A color to replace cells that have no value in the mapping
  • classification_strategy (str or ClassificationStrategy, optional) – A string giving the strategy for converting tile values to colors. e.g., if ClassificationStrategy.LESS_THAN_OR_EQUAL_TO is specified, and the break map is {3: 0xff0000ff, 4: 0x00ff00ff}, then values up to 3 map to red, values from above 3 and up to and including 4 become green, and values over 4 become the fallback color.
Returns:

ColorMap

classmethod from_colors(breaks, color_list, no_data_color=0, fallback=0, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>)

Converts lists of values and colors to a ColorMap.

Parameters:
  • breaks (list) – The tile values that specify breaks in the color mapping.
  • color_list ([int]) – The colors corresponding to the values in the breaks list, represented as integers—e.g., 0xff000080 is red at half opacity.
  • no_data_color (int, optional) – A color to replace NODATA values with
  • fallback (int, optional) – A color to replace cells that have no value in the mapping
  • classification_strategy (str or ClassificationStrategy, optional) – A string giving the strategy for converting tile values to colors. e.g., if ClassificationStrategy.LESS_THAN_OR_EQUAL_TO is specified, and the break map is {3: 0xff0000ff, 4: 0x00ff00ff}, then values up to 3 map to red, values from above 3 and up to and including 4 become green, and values over 4 become the fallback color.
Returns:

ColorMap

classmethod from_histogram(histogram, color_list, no_data_color=0, fallback=0, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>)

Converts a wrapped GeoTrellis histogram into a ColorMap.

Parameters:
  • histogram (Histogram) – A Histogram instance; specifies breaks
  • color_list ([int]) – The colors corresponding to the values in the breaks list, represented as integers e.g., 0xff000080 is red at half opacity.
  • no_data_color (int, optional) – A color to replace NODATA values with
  • fallback (int, optional) – A color to replace cells that have no value in the mapping
  • classification_strategy (str or ClassificationStrategy, optional) – A string giving the strategy for converting tile values to colors. e.g., if ClassificationStrategy.LESS_THAN_OR_EQUAL_TO is specified, and the break map is {3: 0xff0000ff, 4: 0x00ff00ff}, then values up to 3 map to red, values from above 3 and up to and including 4 become green, and values over 4 become the fallback color.
Returns:

ColorMap

static nlcd_colormap()

Returns a color map for NLCD tiles.

Returns:ColorMap
class geopyspark.LayerType

The type of the key within the tuple of the wrapped RDD.

SPACETIME = 'spacetime'
SPATIAL = 'spatial'

Indicates that the RDD contains (K, V) pairs, where the K has a spatial and time attribute. Both TemporalProjectedExtent and SpaceTimeKey are examples of this type of K.

class geopyspark.IndexingMethod

How the wrapped should be indexed when saved.

HILBERT = 'hilbert'

A key indexing method. Works only for RDDs that contain SpatialKey. This method provides the fastest lookup of all the key indexing method, however, it does not give good locality guarantees. It is recommended then that this method should only be used when locality is not important for your analysis.

ROWMAJOR = 'rowmajor'
ZORDER = 'zorder'

A key indexing method. Works for RDDs that contain both SpatialKey and SpaceTimeKey. Note, indexes are determined by the x, y, and if SPACETIME, the temporal resolutions of a point. This is expressed in bits, and has a max value of 62. Thus if the sum of those resolutions are greater than 62, then the indexing will fail.

class geopyspark.ResampleMethod

Resampling Methods.

AVERAGE = 'Average'
BILINEAR = 'Bilinear'
CUBIC_CONVOLUTION = 'CubicConvolution'
CUBIC_SPLINE = 'CubicSpline'
LANCZOS = 'Lanczos'
MAX = 'Max'
MEDIAN = 'Median'
MIN = 'Min'
MODE = 'Mode'
NEAREST_NEIGHBOR = 'NearestNeighbor'
class geopyspark.TimeUnit

ZORDER time units.

DAYS = 'days'
HOURS = 'hours'
MILLIS = 'millis'
MINUTES = 'minutes'
MONTHS = 'months'
SECONDS = 'seconds'
WEEKS = 'weeks'
YEARS = 'years'
class geopyspark.Operation

Focal opertions.

ASPECT = 'Aspect'
MAX = 'Max'
MEAN = 'Mean'
MEDIAN = 'Median'
MIN = 'Min'
MODE = 'Mode'
STANDARD_DEVIATION = 'StandardDeviation'
SUM = 'Sum'
VARIANCE = 'Variance'
class geopyspark.Neighborhood

Neighborhood types.

ANNULUS = 'Annulus'
CIRCLE = 'Circle'
NESW = 'Nesw'
SQUARE = 'Square'
WEDGE = 'Wedge'
class geopyspark.ClassificationStrategy

Classification strategies for color mapping.

EXACT = 'Exact'
GREATER_THAN = 'GreaterThan'
GREATER_THAN_OR_EQUAL_TO = 'GreaterThanOrEqualTo'
LESS_THAN = 'LessThan'
LESS_THAN_OR_EQUAL_TO = 'LessThanOrEqualTo'
class geopyspark.CellType

Cell types.

BOOL = 'bool'
BOOLRAW = 'boolraw'
FLOAT32 = 'float32'
FLOAT32RAW = 'float32raw'
FLOAT64 = 'float64'
FLOAT64RAW = 'float64raw'
INT16 = 'int16'
INT16RAW = 'int16raw'
INT32 = 'int32'
INT32RAW = 'int32raw'
INT8 = 'int8'
INT8RAW = 'int8raw'
UINT16 = 'uint16'
UINT16RAW = 'uint16raw'
UINT8 = 'uint8'
UINT8RAW = 'uint8raw'
class geopyspark.ColorRamp

ColorRamp names.

BLUE_TO_ORANGE = 'BlueToOrange'
BLUE_TO_RED = 'BlueToRed'
CLASSIFICATION_BOLD_LAND_USE = 'ClassificationBoldLandUse'
CLASSIFICATION_MUTED_TERRAIN = 'ClassificationMutedTerrain'
COOLWARM = 'CoolWarm'
GREEN_TO_RED_ORANGE = 'GreenToRedOrange'
HEATMAP_BLUE_TO_YELLOW_TO_RED_SPECTRUM = 'HeatmapBlueToYellowToRedSpectrum'
HEATMAP_DARK_RED_TO_YELLOW_WHITE = 'HeatmapDarkRedToYellowWhite'
HEATMAP_LIGHT_PURPLE_TO_DARK_PURPLE_TO_WHITE = 'HeatmapLightPurpleToDarkPurpleToWhite'
HEATMAP_YELLOW_TO_RED = 'HeatmapYellowToRed'
Hot = 'Hot'
INFERNO = 'Inferno'
LIGHT_TO_DARK_GREEN = 'LightToDarkGreen'
LIGHT_TO_DARK_SUNSET = 'LightToDarkSunset'
LIGHT_YELLOW_TO_ORANGE = 'LightYellowToOrange'
MAGMA = 'Magma'
PLASMA = 'Plasma'
VIRIDIS = 'Viridis'
class geopyspark.StorageMethod

Internal storage methods for GeoTiffs.

STRIPED = 'Striped'
TILED = 'Tiled'
class geopyspark.ColorSpace

Color space types for GeoTiffs.

BLACK_IS_ZERO = 1
CFA = 32803
CIE_LAB = 8
CMYK = 5
ICC_LAB = 9
ITU_LAB = 10
LINEAR_RAW = 34892
LOG_L = 32844
LOG_LUV = 32845
PALETTE = 3
RGB = 2
TRANSPARENCY_MASK = 4
WHITE_IS_ZERO = 0
Y_CB_CR = 6
class geopyspark.Compression

Compression methods for GeoTiffs.

DEFLATE_COMPRESSION = 'DeflateCompression'
NO_COMPRESSION = 'NoCompression'
class geopyspark.Unit

Represents the units of elevation.

FEET = 'Feet'
METERS = 'Meters'
class geopyspark.ReadMethod

An enumeration.

GDAL = 'GDAL'
GEOTRELLIS = 'GeoTrellis'
geopyspark.cost_distance(friction_layer, geometries, max_distance)

Performs cost distance of a TileLayer.

Parameters:
  • friction_layer (TiledRasterLayer) – TiledRasterLayer of a friction surface to traverse.
  • geometries (list) –

    A list of shapely geometries to be used as a starting point.

    Note

    All geometries must be in the same CRS as the TileLayer.

  • max_distance (int or float) – The maximum cost that a path may reach before the operation. stops. This value can be an int or float.
Returns:

TiledRasterLayer

geopyspark.euclidean_distance(geometry, source_crs, zoom, cell_type=<CellType.FLOAT64: 'float64'>)

Calculates the Euclidean distance of a Shapely geometry.

Parameters:
  • geometry (shapely.geometry) – The input geometry to compute the Euclidean distance for.
  • source_crs (str or int) – The CRS of the input geometry.
  • zoom (int) – The zoom level of the output raster.
  • cell_type (str or CellType, optional) – The data type of the cells for the new layer. If not specified, then CellType.FLOAT64 is used.

Note

This function may run very slowly for polygonal inputs if they cover many cells of the output raster.

Returns:TiledRasterLayer
geopyspark.hillshade(tiled_raster_layer, zfactor_calculator, band=0, azimuth=315.0, altitude=45.0)

Computes Hillshade (shaded relief) from a raster.

The resulting raster will be a shaded relief map (a hill shading) based on the sun altitude, azimuth, and the zfactor. The zfactor is a conversion factor from map units to elevation units.

The hillshade` operation will be carried out in a SQUARE neighborhood with with an extent of 1. The zfactor will be derived from the zfactor_calculator for each Tile in the Layer. The resulting Layer will have a cell_type of INT16 regardless of the input Layer’s cell_type; as well as have a single band, that represents the calculated hillshade.

Returns a raster of ShortConstantNoDataCellType.

For descriptions of parameters, please see Esri Desktop’s description of Hillshade.

Parameters:
  • tiled_raster_layer (TiledRasterLayer) – The base layer that contains the rasters used to compute the hillshade.
  • zfactor_calculator (py4j.JavaObject) – A JavaObject that represents the Scala ZFactorCalculator class. This can be created using either the zfactor_lat_lng_calculator() or the zfactor_calculator() methods.
  • band (int, optional) – The band of the raster to base the hillshade calculation on. Default is 0.
  • azimuth (float, optional) – The azimuth angle of the source of light. Default value is 315.0.
  • altitude (float, optional) – The angle of the altitude of the light above the horizon. Default is 45.0.
Returns:

TiledRasterLayer

class geopyspark.Histogram(scala_histogram)

A wrapper class for a GeoTrellis Histogram.

The underlying histogram is produced from the values within a TiledRasterLayer. These values represented by the histogram can either be Int or Float depending on the data type of the cells in the layer.

Parameters:scala_histogram (py4j.JavaObject) – An instance of the GeoTrellis histogram.
scala_histogram

py4j.JavaObject – An instance of the GeoTrellis histogram.

bin_counts()

Returns a list of tuples where the key is the bin label value and the value is the label’s respective count.

Returns:[(int, int)] or [(float, int)]
bucket_count()

Returns the number of buckets within the histogram.

Returns:int
cdf()

Returns the cdf of the distribution of the histogram.

Returns:[(float, float)]
classmethod from_dict(value)

Encodes histogram as a dictionary

item_count(item)

Returns the total number of times a given item appears in the histogram.

Parameters:item (int or float) – The value whose occurences should be counted.
Returns:The total count of the occurences of item in the histogram.
Return type:int
max()

The largest value of the histogram.

This will return either an int or float depedning on the type of values within the histogram.

Returns:int or float
mean()

Determines the mean of the histogram.

Returns:float
median()

Determines the median of the histogram.

Returns:float
merge(other_histogram)

Merges this instance of Histogram with another. The resulting Histogram will contain values from both ``Histogram``s

Parameters:other_histogram (Histogram) – The Histogram that should be merged with this instance.
Returns:Histogram
min()

The smallest value of the histogram.

This will return either an int or float depedning on the type of values within the histogram.

Returns:int or float
min_max()

The largest and smallest values of the histogram.

This will return either an int or float depedning on the type of values within the histogram.

Returns:(int, int) or (float, float)
mode()

Determines the mode of the histogram.

This will return either an int or float depedning on the type of values within the histogram.

Returns:int or float
quantile_breaks(num_breaks)

Returns quantile breaks for this Layer.

Parameters:num_breaks (int) – The number of breaks to return.
Returns:[int]
to_dict()

Encodes histogram as a dictionary

Returns:dict
values()

Lists each indiviual value within the histogram.

This will return a list of either ``int``s or ``float``s depedning on the type of values within the histogram.

Returns:[int] or [float]
class geopyspark.RasterLayer(layer_type, srdd)

A wrapper of a RDD that contains GeoTrellis rasters.

Represents a layer that wraps a RDD that contains (K, V). Where K is either ProjectedExtent or TemporalProjectedExtent depending on the layer_type of the RDD, and V being a Tile.

The data held within this layer has not been tiled. Meaning the data has yet to be modified to fit a certain layout. See raster_rdd for more information.

Parameters:
  • layer_type (str or LayerType) – What the layer type of the geotiffs are. This is represented by either constants within LayerType or by a string.
  • srdd (py4j.java_gateway.JavaObject) – The coresponding Scala class. This is what allows RasterLayer to access the various Scala methods.
pysc

pyspark.SparkContext – The SparkContext being used this session.

layer_type

LayerType – What the layer type of the geotiffs are.

srdd

py4j.java_gateway.JavaObject – The coresponding Scala class. This is what allows RasterLayer to access the various Scala methods.

bands(band)

Select a subsection of bands from the Tiles within the layer.

Note

There could be potential high performance cost if operations are performed between two sub-bands of a large data set.

Note

Due to the natue of GeoPySpark’s backend, if selecting a band that is out of bounds then the error returned will be a py4j.protocol.Py4JJavaError and not a normal Python error.

Parameters:band (int or tuple or list or range) – The band(s) to be selected from the Tiles. Can either be a single int, or a collection of ints.
Returns:RasterLayer with the selected bands.
cache()

Persist this RDD with the default storage level (C{MEMORY_ONLY}).

collect_keys()

Returns a list of all of the keys in the layer.

Note

This method should only be called on layers with a smaller number of keys, as a large number could cause memory issues.

Returns:[:class:`~geopyspark.geotrellis.SpatialKey`] or [:ob:`~geopyspark.geotrellis.SpaceTimeKey`]
collect_metadata(layout=LocalLayout(tile_cols=256, tile_rows=256))

Iterate over the RDD records and generates layer metadata desribing the contained rasters.

:param layout (LayoutDefinition or: GlobalLayout or
LocalLayout, optional):
Target raster layout for the tiling operation.
Returns:Metadata
convert_data_type(new_type, no_data_value=None)

Converts the underlying, raster values to a new CellType.

Parameters:
  • new_type (str or CellType) – The data type the cells should be to converted to.
  • no_data_value (int or float, optional) – The value that should be marked as NoData.
Returns:

RasterLayer

Raises:
  • ValueError – If no_data_value is set and the new_type contains raw values.
  • ValueError – If no_data_value is set and new_type is a boolean.
count()

Returns how many elements are within the wrapped RDD.

Returns:The number of elements in the RDD.
Return type:Int
filter_by_times(time_intervals)

Filters a SPACETIME layer by keeping only the values whose keys fall within a the given time interval(s).

Parameters:time_intervals ([datetime.datetime]) – A list of the time intervals to query. This list can have one or multiple elements. If just a single element, then only exact matches with that given time will be kept. If there are multiple times given, then they are each paired together so that they form ranges of time. In the case where there are an odd number of elements, then the remaining time will be treated as a single query and not a range.

Note

If nothing intersects the given time_intervals, then the returned RasterLayer will be empty.

Returns:RasterLayer
classmethod from_numpy_rdd(layer_type, numpy_rdd)

Create a RasterLayer from a numpy RDD.

Parameters:
  • layer_type (str or LayerType) – What the layer type of the geotiffs are. This is represented by either constants within LayerType or by a string.
  • numpy_rdd (pyspark.RDD) – A PySpark RDD that contains tuples of either ProjectedExtents or TemporalProjectedExtents and rasters that are represented by a numpy array.
Returns:

RasterLayer

getNumPartitions()

Returns the number of partitions set for the wrapped RDD.

Returns:The number of partitions.
Return type:Int
get_class_histogram()

Creates a Histogram of integer values. Suitable for classification rasters with limited number values. If only single band is present histogram is returned directly.

Returns:Histogram or [Histogram]
get_histogram()

Creates a Histogram for each band in the layer. If only single band is present histogram is returned directly.

Returns:Histogram or [Histogram]
get_min_max()

Returns the maximum and minimum values of all of the rasters in the layer.

Returns:(float, float)
get_partition_strategy()

Returns the partitioning strategy if the layer has one.

Returns:HashPartitioner or SpatialPartitioner or SpaceTimePartitionStrategy or None
get_quantile_breaks(num_breaks)

Returns quantile breaks for this Layer.

Parameters:num_breaks (int) – The number of breaks to return.
Returns:[float]
get_quantile_breaks_exact_int(num_breaks)

Returns quantile breaks for this Layer. This version uses the FastMapHistogram, which counts exact integer values. If your layer has too many values, this can cause memory errors.

Parameters:num_breaks (int) – The number of breaks to return.
Returns:[int]
isEmpty()

Returns a bool that is True if the layer is empty and False if it is not.

Returns:Are there elements within the layer
Return type:bool
layer_type
map_cells(func)

Maps over the cells of each Tile within the layer with a given function.

Note

This operation first needs to deserialize the wrapped RDD into Python and then serialize the RDD back into a TiledRasterRDD once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.

Parameters:func (cells, nd => cells) – A function that takes two arguements: cells and nd. Where cells is the numpy array and nd is the no_data_value of the Tile. It returns cells which are the new cells values of the Tile represented as a numpy array.
Returns:RasterLayer
map_tiles(func)

Maps over each Tile within the layer with a given function.

Note

This operation first needs to deserialize the wrapped RDD into Python and then serialize the RDD back into a RasterRDD once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.

Parameters:func (Tile => Tile) – A function that takes a Tile and returns a Tile.
Returns:RasterLayer
merge(partition_strategy=None)

Merges the Tile of each K together to produce a single Tile.

This method will reduce each value by its key within the layer to produce a single (K, V) for every K. In order to achieve this, each Tile that shares a K is merged together to form a single Tile. This is done by replacing one Tile’s cells with another’s. Not all cells, if any, may be replaced, however. The following steps are taken to determine if a cell’s value should be replaced:

  1. If the cell contains a NoData value, then it will be replaced.
  2. If no NoData value is set, then a cell with a value of 0 will be replaced.
  3. If neither of the above are true, then the cell retain its value.
Parameters:
  • num_partitions (int, optional) – The number of partitions that the resulting layer should be partitioned with. If None, then the num_partitions will the number of partitions the layer curretly has.
  • partition_strategy (HashPartitionStrategy or SpatialPartitioinStrategy or SpaceTimePartitionStrategy, optional) –

    Sets the Partitioner for the resulting layer and how many partitions it has. Default is, None.

    If None, then the output layer will be the same Partitioner and number of partitions as the source layer.

    If partition_strategy is set but has no num_partitions, then the resulting layer will have the Partioner specified in the strategy with the with same number of partitions the source layer had.

    If partition_strategy is set and has a num_partitions, then the resulting layer will have the Partioner and number of partitions specified in the strategy.

Returns:

RasterLayer

partitionBy(partition_strategy=None)

Repartitions the layer using the given partitioning strategy.

Parameters:partition_strategy (HashPartitionStrategy or SpatialPartitioinStrategy or SpaceTimePartitionStrategy, optional) –

Sets the Partitioner for the resulting layer and how many partitions it has. Default is, None.

If None, then the output layer will be the same as the source layer.

If partition_strategy is set but has no num_partitions, then the resulting layer will have the Partioner specified in the strategy with the with same number of partitions the source layer had.

If partition_strategy is set and has a num_partitions, then the resulting layer will have the Partioner and number of partitions specified in the strategy.

Returns:RasterLayer
persist(storageLevel=StorageLevel(False, True, False, False, 1))

Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).

pysc
classmethod read(paths, layer_type=<LayerType.SPATIAL: 'spatial'>, target_crs=None, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>, read_method=<ReadMethod.GEOTRELLIS: 'GeoTrellis'>)

Creates a RasterLayer from a list of data sources.

Note

This is feature is still a WIP, so not all features are currently supported.

Parameters:
  • paths (str or [str]) – A path or a list of paths that point to geo-spatial data. These strings can be in either a URI format or a relative path.
  • layer_type (str or LayerType, optional) –

    What the layer type of the geotiffs are. This is represented by either constants within LayerType or by a string.

    Note

    Only SPATIAL layer types are currently supported.

  • target_crs (str or int, optional) – The CRS that the output tiles should be in. If None, then the CRS that the tiles were originally in will be used.
  • resample_method (str or ResampleMethod, optional) – The resample method to use when building internal overviews. Default is, ResampleMethods.NEAREST_NEIGHBOR.
  • read_method (str or ReadMethod, optional) –

    The method that should be used to read in the data. The GEOTRELLIS method can only read GeoTiffs, but is already setup. While the other method, GDAL can read other data sources, but it requires that GDAL be setup locally with the required drivers. Default is, GeoTrellis.

    Note

    Only the GEOTRELLIS method is currently supported.

Returns:

RasterLayer

reclassify(value_map, data_type, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>, replace_nodata_with=None, fallback_value=None, strict=False)

Changes the cell values of a raster based on how the data is broken up in the given value_map.

Parameters:
  • value_map (dict) – A dict whose keys represent values where a break should occur and its values are the new value the cells within the break should become.
  • data_type (type) – The type of the values within the rasters. Can either be int or float.
  • classification_strategy (str or ClassificationStrategy, optional) – How the cells should be classified along the breaks. If unspecified, then ClassificationStrategy.LESS_THAN_OR_EQUAL_TO will be used.
  • replace_nodata_with (int or float, optional) –

    When remapping values, NoData values must be treated separately. If NoData values are intended to be replaced during the reclassify, this variable should be set to the intended value. If unspecified, NoData values will be preserved.

    Note

    Specifying replace_nodata_with will change the value of given cells, but the NoData value of the layer will remain unchanged.

  • fallback_value (int or float, optional) – Represents the value that should be used when a cell’s value does not fall within the classification_strategy. Default is to use the layer’s NoData value.
  • strict (bool, optional) – Determines whether or not an error should be thrown if a cell’s value does not fall within the classification_strategy. Default is, False.
Returns:

RasterLayer

repartition(num_partitions=None)

Repartitions the layer to have a different number of partitions.

Parameters:num_partitions (int, optional) – Desired number of partitions. Default is, None .If None, then the exisiting number of partitions will be used.
Returns:RasterLayer
reproject(target_crs, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)

Reproject rasters to target_crs. The reproject does not sample past tile boundary.

Parameters:
  • target_crs (str or int) – Target CRS of reprojection. Either EPSG code, well-known name, or a PROJ.4 string.
  • resample_method (str or ResampleMethod, optional) – The resample method to use for the reprojection. If none is specified, then ResampleMethods.NEAREST_NEIGHBOR is used.
Returns:

RasterLayer

srdd
tile_to_layout(layout=LocalLayout(tile_cols=256, tile_rows=256), target_crs=None, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>, partition_strategy=None)

Cut tiles to layout and merge overlapping tiles. This will produce unique keys.

Parameters:
  • layout (Metadata or TiledRasterLayer or LayoutDefinition or GlobalLayout or LocalLayout) – Target raster layout for the tiling operation.
  • target_crs (str or int, optional) – Target CRS of reprojection. Either EPSG code, well-known name, or a PROJ.4 string. If None, no reproject will be perfomed.
  • resample_method (str or ResampleMethod, optional) – The cell resample method to used during the tiling operation. Default is``ResampleMethods.NEAREST_NEIGHBOR``.
  • partition_strategy (HashPartitionStrategy or SpatialPartitioinStrategy or SpaceTimePartitionStrategy, optional) –

    Sets the Partitioner for the resulting layer and how many partitions it has. Default is, None.

    If None, then the output layer will be the same Partitioner and number of partitions as the source layer.

    If partition_strategy is set but has no num_partitions, then the resulting layer will have the Partioner specified in the strategy with the with same number of partitions the source layer had.

    If partition_strategy is set and has a num_partitions, then the resulting layer will have the Partioner and number of partitions specified in the strategy.

Returns:

TiledRasterLayer

to_geotiff_rdd(storage_method=<StorageMethod.TILED: 'Tiled'>, rows_per_strip=None, tile_dimensions=(256, 256), resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>, decimations=[], compression=<Compression.NO_COMPRESSION: 'NoCompression'>, color_space=<ColorSpace.BLACK_IS_ZERO: 1>, color_map=None, head_tags=None, band_tags=None)

Converts the rasters within this layer to GeoTiffs which are then converted to bytes. This is returned as a RDD[(K, bytes)]. Where K is either ProjectedExtent or TemporalProjectedExtent.

Parameters:
  • storage_method (str or StorageMethod, optional) – How the segments within the GeoTiffs should be arranged. Default is StorageMethod.STRIPED.
  • rows_per_strip (int, optional) – How many rows should be in each strip segment of the GeoTiffs if storage_method is StorageMethod.STRIPED. If None, then the strip size will default to a value that is 8K or less.
  • tile_dimensions ((int, int), optional) – The length and width for each tile segment of the GeoTiff if storage_method is StorageMethod.TILED. If None then the default size is (256, 256).
  • resample_method (str or ResampleMethod, optional) – The resample method to use when building internal overviews. Default is, ResampleMethods.NEAREST_NEIGHBOR.
  • decimations ([int], optional) – The decimation factors to use when building the internal overviews of the GeoTiff. By default, [] no factors used.
  • compression (str or Compression, optional) – How the data should be compressed. Defaults to Compression.NO_COMPRESSION.
  • color_space (str or ColorSpace, optional) – How the colors should be organized in the GeoTiffs. Defaults to ColorSpace.BLACK_IS_ZERO.
  • color_map (ColorMap, optional) – A ColorMap instance used to color the GeoTiffs to a different gradient.
  • head_tags (dict, optional) – A dict where each key and value is a str.
  • band_tags (list, optional) – A list of dicts where each key and value is a str.
  • Note – For more information on the contents of the tags, see www.gdal.org/gdal_datamodel.html
Returns:

RDD[(K, bytes)]

to_numpy_rdd()

Converts a RasterLayer to a numpy RDD.

Note

Depending on the size of the data stored within the RDD, this can be an exspensive operation and should be used with caution.

Returns:RDD
to_png_rdd(color_map)

Converts the rasters within this layer to PNGs which are then converted to bytes. This is returned as a RDD[(K, bytes)].

Parameters:color_map (ColorMap) – A ColorMap instance used to color the PNGs.
Returns:RDD[(K, bytes)]
to_spatial_layer(target_time=None)

Converts a RasterLayer with a layout_type of LayoutType.SPACETIME to a RasterLayer with a layout_type of LayoutType.SPATIAL.

Parameters:target_time (datetime.datetime, optional) – The instance of interest. If set, the resulting RasterLayer will only contain keys that contained the given instance. If None, then all values within the layer will be kept.
Returns:RasterLayer
Raises:ValueError – If the layer already has a layout_type of LayoutType.SPATIAL.
unpersist()

Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.

with_no_data(no_data_value)

Changes the NoData value of the layer with the new given value.

It is possible to specify a NoData value for layers with raw values. The resulting layer will be of the same CellType but with a user defined NoData value. For example, if a layer has a CellType of float32raw and a no_data_value of -10 is given, then the produced layer will have a CellType of float32ud-10.0.

If the target layer has a bool CellType, then the no_data_value will be ignored and the result layer will be the same as the origin. In order to assign a NoData value to a bool layer, the convert_data_type() method must be used.

Parameters:no_data_value (int or float) – The new NoData value of the layer.
Returns:RasterLayer
wrapped_rdds()

Returns the list of RDD-containing objects wrapped by this object. The default implementation assumes that subclass contains a single RDD container, srdd, which implements the persist() and unpersist() methods.

class geopyspark.TiledRasterLayer(layer_type, srdd)

Wraps a RDD of tiled, GeoTrellis rasters.

Represents a RDD that contains (K, V). Where K is either SpatialKey or SpaceTimeKey depending on the layer_type of the RDD, and V being a Tile.

The data held within the layer is tiled. This means that the rasters have been modified to fit a larger layout. For more information, see tiled-raster-rdd.

Parameters:
  • layer_type (str or LayerType) – What the layer type of the geotiffs are. This is represented by either constants within LayerType or by a string.
  • srdd (py4j.java_gateway.JavaObject) – The coresponding Scala class. This is what allows TiledRasterLayer to access the various Scala methods.
pysc

pyspark.SparkContext – The SparkContext being used this session.

layer_type

LayerType – What the layer type of the geotiffs are.

srdd

py4j.java_gateway.JavaObject – The coresponding Scala class. This is what allows RasterLayer to access the various Scala methods.

is_floating_point_layer

bool – Whether the data within the TiledRasterLayer is floating point or not.

layer_metadata

Metadata – The layer metadata associated with this layer.

zoom_level

int – The zoom level of the layer. Can be None.

aggregate_by_cell(operation)

Computes an aggregate summary for each cell of all of the values for each key.

The operation given is a local map algebra function that will be applied to all values that share the same key. If there are multiple copies of the same key in the layer, then this method will reduce all instances of the (K, Tile) pairs into a single element. This resulting (K, Tile)’s Tile will contain the aggregate summaries of each cell of the reduced Tiles that had the same K.

Note

Not all Operations are supported. Only SUM, MIN, MAX, MEAN, VARIANCE, AND STANDARD_DEVIATION can be used.

Note

If calculating VARIANCE or STANDARD_DEVIATION, then any K that is a single copy will have a resulting Tile that is filled with NoData values. This is because the variance of a single element is undefined.

Parameters:operation (str or Operation) – The aggregate operation to be performed.
Returns:TiledRasterLayer
bands(band)

Select a subsection of bands from the Tiles within the layer.

Note

There could be potential high performance cost if operations are performed between two sub-bands of a large data set.

Note

Due to the natue of GeoPySpark’s backend, if selecting a band that is out of bounds then the error returned will be a py4j.protocol.Py4JJavaError and not a normal Python error.

Parameters:band (int or tuple or list or range) – The band(s) to be selected from the Tiles. Can either be a single int, or a collection of ints.
Returns:TiledRasterLayer with the selected bands.
cache()

Persist this RDD with the default storage level (C{MEMORY_ONLY}).

collect_keys()

Returns a list of all of the keys in the layer.

Note

This method should only be called on layers with a smaller number of keys, as a large number could cause memory issues.

Returns:[:class:`~geopyspark.geotrellis.ProjectedExtent`] or [:class:`~geopyspark.geotrellis.TemporalProjectedExtent`]
convert_data_type(new_type, no_data_value=None)

Converts the underlying, raster values to a new CellType.

Parameters:
  • new_type (str or CellType) – The data type the cells should be to converted to.
  • no_data_value (int or float, optional) – The value that should be marked as NoData.
Returns:

TiledRasterLayer

Raises:
  • ValueError – If no_data_value is set and the new_type contains raw values.
  • ValueError – If no_data_value is set and new_type is a boolean.
count()

Returns how many elements are within the wrapped RDD.

Returns:The number of elements in the RDD.
Return type:Int
filter_by_times(time_intervals)

Filters a SPACETIME layer by keeping only the values whose keys fall within a the given time interval(s).

Parameters:time_intervals ([datetime.datetime]) – A list of the time intervals to query. This list can have one or multiple elements. If just a single element, then only exact matches with that given time will be kept. If there are multiple times given, then they are each paired together so that they form ranges of time. In the case where there are an odd number of elements, then the remaining time will be treated as a single query and not a range.

Note

If nothing intersects the given time_intervals, then the returned TiledRasterLayer will be empty.

Returns:TiledRasterLayer
focal(operation, neighborhood=None, param_1=None, param_2=None, param_3=None, partition_strategy=None)

Performs the given focal operation on the layers contained in the Layer.

Parameters:
  • operation (str or Operation) – The focal operation to be performed.
  • neighborhood (str or Neighborhood, optional) – The type of neighborhood to use in the focal operation. This can be represented by either an instance of Neighborhood, or by a constant.
  • param_1 (int or float, optional) – The first argument of neighborhood.
  • param_2 (int or float, optional) – The second argument of the neighborhood.
  • param_3 (int or float, optional) – The third argument of the neighborhood.
  • partition_strategy (HashPartitionStrategy or SpatialPartitioinStrategy or SpaceTimePartitionStrategy, optional) –

    Sets the Partitioner for the resulting layer and how many partitions it has. Default is, None.

    If None, then the output layer will be the same Partitioner and number of partitions as the source layer.

    If partition_strategy is set but has no num_partitions, then the resulting layer will have the Partioner specified in the strategy with the with same number of partitions the source layer had.

    If partition_strategy is set and has a num_partitions, then the resulting layer will have the Partioner and number of partitions specified in the strategy.

Note

param only need to be set if neighborhood is not an instance of Neighborhood or if neighborhood is None.

Any param that is not set will default to 0.0.

If neighborhood is None then operation must be Operation.ASPECT.

Returns:

TiledRasterLayer

Raises:
  • ValueError – If operation is not a known operation.
  • ValueError – If neighborhood is not a known neighborhood.
  • ValueError – If neighborhood was not set, and operation is not Operation.ASPECT.
classmethod from_numpy_rdd(layer_type, numpy_rdd, metadata, zoom_level=None)

Creates a TiledRasterLayer from a numpy RDD.

Parameters:
  • layer_type (str or LayerType) – What the layer type of the geotiffs are. This is represented by either constants within LayerType or by a string.
  • numpy_rdd (pyspark.RDD) – A PySpark RDD that contains tuples of either SpatialKey or SpaceTimeKey and rasters that are represented by a numpy array.
  • metadata (Metadata) – The Metadata of the TiledRasterLayer instance.
  • zoom_level (int, optional) – The zoom_level the resulting TiledRasterLayer should have. If None, then the returned layer’s zoom_level will be None.
Returns:

TiledRasterLayer

classmethod from_rasterframe(rasterframe, zoom_level=None)

Creates a TiledRasterLayer from a ``pyrasterframes.RasterFrame.

Note

pyrasterframes needs to initialized via the .withRasterFrames() extension method on the active SparkSession object in order to use this method.

Parameters:
  • rasterframe (pyrasterframes.RasterFrame) – The target RasterFrame that will be converted into a TiledRasterLayer.
  • zoom_level (int, optional) – The zoom_level the resulting TiledRasterLayer should have. If None, then the returned layer’s zoom_level will be None.
Returns:

TiledRasterLayer

getNumPartitions()

Returns the number of partitions set for the wrapped RDD.

Returns:The number of partitions.
Return type:Int
get_cell_value_counts(area_of_interest=None, target_band=0)

Returns a dictionary that contains the cell values and their respective counts in the given area_of_interest.

Note

This method will always return the cell values has ints regardless of the cell type of the source layer. If the values are not ints, then they will be converted to an instance of one.

Parameters:
  • area_of_interest (Extent or shapely.geometry, optional) – The area where the counting should be done. Default is, None. If None, then the whole layer will be used.
  • target_band (int, optional) – Which band should be used to produce the counts. Default is, 0.
Returns:

Dict that contains the cell values and their counts

get_class_histogram()

Creates a Histogram of integer values. Suitable for classification rasters with limited number values. If only single band is present histogram is returned directly.

Returns:Histogram or [Histogram]
get_histogram()

Creates a Histogram for each band in the layer. If only single band is present histogram is returned directly.

Returns:Histogram or [Histogram]
get_min_max()

Returns the maximum and minimum values of all of the rasters in the layer.

Returns:(float, float)
get_partition_strategy()

Returns the partitioning strategy if the layer has one.

Returns:HashPartitioner or SpatialPartitioner or SpaceTimePartitionStrategy or None
get_point_values(points, resample_method=None)

Returns the values of the layer at given points.

Note

Only points that are contained within a layer will be sampled. This means that if a point lies on the southern or eastern boundary of a cell, it will not be sampled.

Parameters:
  • or {k (points([shapely.geometry.Point]) – shapely.geometry.Point}): Either a list of, or a dictionary whose values are shapely.geometry.Points. If a dictionary, then the type of its keys does not matter. These points must be in the same projection as the tiles within the layer.
  • resample_method (str or ResampleMethod, optional) –

    The resampling method to use before obtaining the point values. If not specified, then None is used.

    Note

    Not all ResampleMethods can be used to resample point values. ResampleMethod.NEAREST_NEIGHBOR, ResampleMethod.BILINEAR`, ResampleMethod.CUBIC_CONVOLUTION, and ResampleMethod.CUBIC_SPLINE are the only ones that can be used.

Returns:

The return type will vary depending on the type of points and the layer_type of the sampled layer.

If points is a list and the layer_type is SPATIAL:

[(shapely.geometry.Point, [float])]

If points is a list and the layer_type is SPACETIME:

[(shapely.geometry.Point, [(datetime.datetime, [float])])]

If points is a dict and the layer_type is SPATIAL:

{k: (shapely.geometry.Point, [float])}

If points is a dict and the layer_type is SPACETIME:

{k: (shapely.geometry.Point, [(datetime.datetime, [float])])}

The shapely.geometry.Point in all of these returns is the original sampled point given. The [float] are the sampled values, one for each band. If the layer_type was SPACETIME, then the timestamp will also be included in the results represented by a datetime.datetime instance. These times and their associated values will be given as a list of tuples for each point.

Note

The sampled values will always be returned as floats. Regardless of the cellType of the layer.

If points was given as a dict then the keys of that dictionary will be the keys in the returned dict.

get_quantile_breaks(num_breaks)

Returns quantile breaks for this Layer.

Parameters:num_breaks (int) – The number of breaks to return.
Returns:[float]
get_quantile_breaks_exact_int(num_breaks)

Returns quantile breaks for this Layer. This version uses the FastMapHistogram, which counts exact integer values. If your layer has too many values, this can cause memory errors.

Parameters:num_breaks (int) – The number of breaks to return.
Returns:[int]
histogram_series(geometries)
isEmpty()

Returns a bool that is True if the layer is empty and False if it is not.

Returns:Are there elements within the layer
Return type:bool
layer_type
local_max(value)

Determines the maximum value for each cell of each Tile in the layer.

This method takes a max_constant that is compared to each cell in the layer. If max_constant is larger, then the resulting cell value will be that value. Otherwise, that cell will retain its original value.

Note

NoData values are handled such that taking the max between a normal value and NoData value will always result in NoData.

Parameters:value (int or float or TiledRasterLayer) – The constant value that will be compared to each cell. If this is a TiledRasterLayer, then Tiles who share a key will have each of their cell values compared.
Returns:TiledRasterLayer
lookup(col, row)

Return the value(s) in the image of a particular SpatialKey (given by col and row).

Parameters:
  • col (int) – The SpatialKey column.
  • row (int) – The SpatialKey row.
Returns:

[Tile]

Raises:
  • ValueError – If using lookup on a non LayerType.SPATIAL TiledRasterLayer.
  • IndexError – If col and row are not within the TiledRasterLayer’s bounds.
map_cells(func)

Maps over the cells of each Tile within the layer with a given function.

Note

This operation first needs to deserialize the wrapped RDD into Python and then serialize the RDD back into a TiledRasterRDD once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.

Parameters:func (cells, nd => cells) – A function that takes two arguements: cells and nd. Where cells is the numpy array and nd is the no_data_value of the tile. It returns cells which are the new cells values of the tile represented as a numpy array.
Returns:TiledRasterLayer
map_tiles(func)

Maps over each Tile within the layer with a given function.

Note

This operation first needs to deserialize the wrapped RDD into Python and then serialize the RDD back into a TiledRasterRDD once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.

Parameters:func (Tile => Tile) – A function that takes a Tile and returns a Tile.
Returns:TiledRasterLayer
mask(geometries, partition_strategy=None, options=RasterizerOptions(includePartial=True, sampleType='PixelIsPoint'))

Masks the TiledRasterLayer so that only values that intersect the geometries will be available.

Parameters:
  • geometries (shapely.geometry or [shapely.geometry] or pyspark.RDD[shapely.geometry]) –

    Either a single, list, or Python RDD of shapely geometry/ies to mask the layer.

    Note

    All geometries must be in the same CRS as the TileLayer.

  • partition_strategy (HashPartitionStrategy or SpatialPartitioinStrategy or SpaceTimePartitionStrategy, optional) –

    Sets the Partitioner for the resulting layer and how many partitions it has. Default is, None.

    If None, then the output layer will be the same as the source layer.

    If partition_strategy is set but has no num_partitions, then the resulting layer will have the Partioner specified in the strategy with the with same number of partitions the source layer had.

    If partition_strategy is set and has a num_partitions, then the resulting layer will have the Partioner and number of partitions specified in the strategy.

    Note

    This parameter will only be used if geometries is a pyspark.RDD.

  • options (RasterizerOptions, optional) –

    During the mask operation, rasterization occurs. These options will change the pixel rasterization behavior. Default behavior is to include partial pixel intersection and to treat pixels as points.

    Note

    This parameter will only be used if geometries is a pyspark.RDD.

Returns:

TiledRasterLayer

max_series(geometries)
mean_series(geometries)
merge(partition_strategy=None)

Merges the Tile of each K together to produce a single Tile.

This method will reduce each value by its key within the layer to produce a single (K, V) for every K. In order to achieve this, each Tile that shares a K is merged together to form a single Tile. This is done by replacing one Tile’s cells with another’s. Not all cells, if any, may be replaced, however. The following steps are taken to determine if a cell’s value should be replaced:

  1. If the cell contains a NoData value, then it will be replaced.
  2. If no NoData value is set, then a cell with a value of 0 will be replaced.
  3. If neither of the above are true, then the cell retain its value.
Parameters:
  • num_partitions (int, optional) – The number of partitions that the resulting layer should be partitioned with. If None, then the num_partitions will the number of partitions the layer curretly has.
  • partition_strategy (HashPartitionStrategy or SpatialPartitioinStrategy or SpaceTimePartitionStrategy, optional) –

    Sets the Partitioner for the resulting layer and how many partitions it has. Default is, None.

    If None, then the output layer will be the same Partitioner and number of partitions as the source layer.

    If partition_strategy is set but has no num_partitions, then the resulting layer will have the Partioner specified in the strategy with the with same number of partitions the source layer had.

    If partition_strategy is set and has a num_partitions, then the resulting layer will have the Partioner and number of partitions specified in the strategy.

Returns:

TiledRasterLayer

min_series(geometries)
normalize(new_min, new_max, old_min=None, old_max=None)

Finds the min value that is contained within the given geometry.

Note

If old_max - old_min <= 0 or new_max - new_min <= 0, then the normalization will fail.

Parameters:
  • old_min (int or float, optional) – Old minimum. If not given, then the minimum value of this layer will be used.
  • old_max (int or float, optional) – Old maximum. If not given, then the minimum value of this layer will be used.
  • new_min (int or float) – New minimum to normalize to.
  • new_max (int or float) – New maximum to normalize to.
Returns:

TiledRasterLayer

partitionBy(partition_strategy=None)

Repartitions the layer using the given partitioning strategy.

Parameters:partition_strategy (HashPartitionStrategy or SpatialPartitioinStrategy or SpaceTimePartitionStrategy, optional) –

Sets the Partitioner for the resulting layer and how many partitions it has. Default is, None.

If None, then the output layer will be the same as the source layer.

If partition_strategy is set but has no num_partitions, then the resulting layer will have the Partioner specified in the strategy with the with same number of partitions the source layer had.

If partition_strategy is set and has a num_partitions, then the resulting layer will have the Partioner and number of partitions specified in the strategy.

Returns:TiledRasterLayer
persist(storageLevel=StorageLevel(False, True, False, False, 1))

Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).

polygonal_max(geometry, data_type)

Finds the max value for each band that is contained within the given geometry.

Parameters:
  • geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A Shapely Polygon or MultiPolygon that represents the area where the summary should be computed; or a WKB representation of the geometry.
  • data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns:

[int] or [float] depending on data_type.

Raises:

TypeError – If data_type is not an int or float.

polygonal_mean(geometry)

Finds the mean of all of the values for each band that are contained within the given geometry.

Parameters:geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A Shapely Polygon or MultiPolygon that represents the area where the summary should be computed; or a WKB representation of the geometry.
Returns:[float]
polygonal_min(geometry, data_type)

Finds the min value for each band that is contained within the given geometry.

Parameters:
  • geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A Shapely Polygon or MultiPolygon that represents the area where the summary should be computed; or a WKB representation of the geometry.
  • data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns:

[int] or [float] depending on data_type.

Raises:

TypeError – If data_type is not an int or float.

polygonal_sum(geometry, data_type)

Finds the sum of all of the values in each band that are contained within the given geometry.

Parameters:
  • geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A Shapely Polygon or MultiPolygon that represents the area where the summary should be computed; or a WKB representation of the geometry.
  • data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns:

[int] or [float] depending on data_type.

Raises:

TypeError – If data_type is not an int or float.

pyramid(resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>, partition_strategy=None)

Creates a layer Pyramid where the resolution is halved per level.

Parameters:
  • resample_method (str or ResampleMethod, optional) – The resample method to use when building the pyramid. Default is ResampleMethods.NEAREST_NEIGHBOR.
  • partition_strategy (HashPartitionStrategy or SpatialPartitioinStrategy or SpaceTimePartitionStrategy, optional) –

    Sets the Partitioner for the resulting layer and how many partitions it has. Default is, None.

    If None, then the output layer will be the same Partitioner and number of partitions as the source layer.

    If partition_strategy is set but has no num_partitions, then the resulting layer will have the Partioner specified in the strategy with the with same number of partitions the source layer had.

    If partition_strategy is set and has a num_partitions, then the resulting layer will have the Partioner and number of partitions specified in the strategy.

Returns:

Pyramid.

Raises:

ValueError – If this layer layout is not of GlobalLayout type.

pysc
classmethod read(paths, layout_type, layer_type=<LayerType.SPATIAL: 'spatial'>, target_crs=None, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>, read_method=<ReadMethod.GEOTRELLIS: 'GeoTrellis'>)

Creates a TiledRasterLayer from a list of data sources.

Note

This is feature is still a WIP, so not all features are currently supported.

Parameters:
  • paths (str or [str]) – A path or a list of paths that point to geo-spatial data. These strings can be in either a URI format or a relative path.
  • layout (LayoutDefinition or Metadata or TiledRasterLayer or GlobalLayout or LocalLayout) – Target raster layout for the tiling operation.
  • layer_type (str or LayerType, optional) –

    What the layer type of the geotiffs are. This is represented by either constants within LayerType or by a string.

    Note

    Only SPATIAL layer types are currently supported.

  • target_crs (str or int, optional) – The CRS that the output tiles should be in. If None, then the CRS that the tiles were originally in will be used.
  • resample_method (str or ResampleMethod, optional) – The resample method to use when building internal overviews. Default is, ResampleMethods.NEAREST_NEIGHBOR.
  • read_method (str or ReadMethod, optional) –

    The method that should be used to read in the data. The GEOTRELLIS method can only read GeoTiffs, but is already setup. While the other method, GDAL can read other data sources, but it requires that GDAL be setup locally with the required drivers. Default is, GeoTrellis.

    Note

    Only the GEOTRELLIS method is currently supported.

Returns:

TiledRasterLayer

reclassify(value_map, data_type, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>, replace_nodata_with=None, fallback_value=None, strict=False)

Changes the cell values of a raster based on how the data is broken up in the given value_map.

Parameters:
  • value_map (dict) – A dict whose keys represent values where a break should occur and its values are the new value the cells within the break should become.
  • data_type (type) – The type of the values within the rasters. Can either be int or float.
  • classification_strategy (str or ClassificationStrategy, optional) – How the cells should be classified along the breaks. If unspecified, then ClassificationStrategy.LESS_THAN_OR_EQUAL_TO will be used.
  • replace_nodata_with (int or float, optional) –

    When remapping values, NoData values must be treated separately. If NoData values are intended to be replaced during the reclassify, this variable should be set to the intended value. If unspecified, NoData values will be preserved.

    Note

    Specifying replace_nodata_with will change the value of given cells, but the NoData value of the layer will remain unchanged.

  • fallback_value (int or float, optional) – Represents the value that should be used when a cell’s value does not fall within the classification_strategy. Default is to use the layer’s NoData value.
  • strict (bool, optional) – Determines whether or not an error should be thrown if a cell’s value does not fall within the classification_strategy. Default is, False.
Returns:

TiledRasterLayer

repartition(num_partitions=None)

Repartitions the layer to have a different number of partitions.

Parameters:num_partitions (int, optional) – Desired number of partitions. Default is, None .If None, then the exisiting number of partitions will be used.
Returns:TiledRasterLayer
reproject(target_crs, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)

Reproject rasters to target_crs. The reproject does not sample past tile boundary.

Parameters:
  • target_crs (str or int) – Target CRS of reprojection. Either EPSG code, well-known name, or a PROJ.4 string.
  • resample_method (str or ResampleMethod, optional) – The resample method to use for the reprojection. If none is specified, then ResampleMethods.NEAREST_NEIGHBOR is used.
Returns:

TiledRasterLayer

save_stitched(path, crop_bounds=None, crop_dimensions=None)

Stitch all of the rasters within the Layer into one raster and then saves it to a given path.

Parameters:
  • path (str) – The path of the geotiff to save. The path must be on the local file system.
  • crop_bounds (Extent, optional) – The sub Extent with which to crop the raster before saving. If None, then the whole raster will be saved.
  • crop_dimensions (tuple(int) or list(int), optional) – cols and rows of the image to save represented as either a tuple or list. If None then all cols and rows of the raster will be save.

Note

This can only be used on LayerType.SPATIAL TiledRasterLayers.

Note

If crop_dimensions is set then crop_bounds must also be set.

slope(zfactor_calculator)

Performs the Slope, focal operation on the first band of each Tile in the Layer.

The Slope operation will be carried out in a SQUARE neighborhood with with an extent of 1. A zfactor will be derived from the zfactor_calculator for each Tile in the Layer. The resulting Layer will have a cell_type of FLOAT64 regardless of the input Layer’s cell_type; as well as have a single band, that represents the calculated slope.

Parameters:zfactor_calculator (py4j.JavaObject) – A JavaObject that represents the Scala ZFactorCalculator class. This can be created using either the zfactor_lat_lng_calculator() or the zfactor_calculator() methods.
Returns:TiledRasterLayer
srdd
star_series(geometries, fn)
stitch()

Stitch all of the rasters within the Layer into one raster.

Note

This can only be used on LayerType.SPATIAL TiledRasterLayers.

Returns:Tile
sum_series(geometries)
tile_to_layout(layout, target_crs=None, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>, partition_strategy=None)

Cut tiles to a given layout and merge overlapping tiles. This will produce unique keys.

Parameters:
  • layout (LayoutDefinition or Metadata or TiledRasterLayer or GlobalLayout or LocalLayout) – Target raster layout for the tiling operation.
  • target_crs (str or int, optional) – Target CRS of reprojection. Either EPSG code, well-known name, or a PROJ.4 string. If None, no reproject will be perfomed.
  • resample_method (str or ResampleMethod, optional) – The resample method to use for the reprojection. If none is specified, then ResampleMethods.NEAREST_NEIGHBOR is used.
  • partition_strategy (HashPartitionStrategy or SpatialPartitioinStrategy or SpaceTimePartitionStrategy, optional) –

    Sets the Partitioner for the resulting layer and how many partitions it has. Default is, None.

    If None, then the output layer will be the same Partitioner and number of partitions as the source layer.

    If partition_strategy is set but has no num_partitions, then the resulting layer will have the Partioner specified in the strategy with the with same number of partitions the source layer had.

    If partition_strategy is set and has a num_partitions, then the resulting layer will have the Partioner and number of partitions specified in the strategy.

Returns:

TiledRasterLayer

to_geotiff_rdd(storage_method=<StorageMethod.TILED: 'Tiled'>, rows_per_strip=None, tile_dimensions=(256, 256), resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>, decimations=[], compression=<Compression.NO_COMPRESSION: 'NoCompression'>, color_space=<ColorSpace.BLACK_IS_ZERO: 1>, color_map=None, head_tags=None, band_tags=None)

Converts the rasters within this layer to GeoTiffs which are then converted to bytes. This is returned as a RDD[(K, bytes)]. Where K is either SpatialKey or SpaceTimeKey.

Parameters:
  • storage_method (str or StorageMethod, optional) – How the segments within the GeoTiffs should be arranged. Default is StorageMethod.STRIPED.
  • rows_per_strip (int, optional) – How many rows should be in each strip segment of the GeoTiffs if storage_method is StorageMethod.STRIPED. If None, then the strip size will default to a value that is 8K or less.
  • tile_dimensions ((int, int), optional) – The length and width for each tile segment of the GeoTiff if storage_method is StorageMethod.TILED. If None then the default size is (256, 256).
  • resample_method (str or ResampleMethod, optional) – The resample method to use when building internal overviews. Default is, ResampleMethods.NEAREST_NEIGHBOR.
  • decimations ([int], optional) – The decimation factors to use when building the internal overviews of the GeoTiff. By default, [] no factors used.
  • compression (str or Compression, optional) – How the data should be compressed. Defaults to Compression.NO_COMPRESSION.
  • color_space (str or ColorSpace, optional) – How the colors should be organized in the GeoTiffs. Defaults to ColorSpace.BLACK_IS_ZERO.
  • color_map (ColorMap, optional) – A ColorMap instance used to color the GeoTiffs to a different gradient.
  • head_tags (dict, optional) – A dict where each key and value is a str.
  • band_tags (list, optional) – A list of dicts where each key and value is a str.
  • Note – For more information on the contents of the tags, see www.gdal.org/gdal_datamodel.html
Returns:

RDD[(K, bytes)]

to_numpy_rdd()

Converts a TiledRasterLayer to a numpy RDD.

Note

Depending on the size of the data stored within the RDD, this can be an exspensive operation and should be used with caution.

Returns:RDD
to_png_rdd(color_map)

Converts the rasters within this layer to PNGs which are then converted to bytes. This is returned as a RDD[(K, bytes)].

Parameters:color_map (ColorMap) – A ColorMap instance used to color the PNGs.
Returns:RDD[(K, bytes)]
to_rasterframe(num_bands)

Converts a TiledRasterLayer to a pyrasterframes.RasterFrame.

Note

pyrasterframes needs to initialized via the .withRasterFrames() extension method on the active SparkSession object in order to use this method.

Parameters:num_bands (int) – The number of bands the TiledRasterLayer has.
Returns:TiledRasterLayer
to_spatial_layer(target_time=None)

Converts a TiledRasterLayer with a layout_type of LayoutType.SPACETIME to a TiledRasterLayer with a layout_type of LayoutType.SPATIAL.

Parameters:target_time (datetime.datetime, optional) – The instance of interest. If set, the resulting TiledRasterLayer will only contain keys that contained the given instance. If None, then all values within the layer will be kept.
Returns:TiledRasterLayer
Raises:ValueError – If the layer already has a layout_type of LayoutType.SPATIAL.
tobler()

Generates a Tobler walking speed layer from an elevation layer.

Note

This method has a known issue where the Tobler calculation is direction agnostic. Thus, all slopes are assumed to be uphill. This can result it incorrect results. A fix is currently being worked on.

Returns:TiledRasterLayer
unpersist()

Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.

with_no_data(no_data_value)

Changes the NoData value of the layer with the new given value.

It is possible to specify a NoData value for layers with raw values. The resulting layer will be of the same CellType but with a user defined NoData value. For example, if a layer has a CellType of float32raw and a no_data_value of -10 is given, then the produced layer will have a CellType of float32ud-10.0.

If the target layer has a bool CellType, then the no_data_value will be ignored and the result layer will be the same as the origin. In order to assign a NoData value to a bool layer, the convert_data_type() method must be used.

Parameters:no_data_value (int or float) – The new NoData value of the layer.
Returns:TiledRasterLayer
wrapped_rdds()

Returns the list of RDD-containing objects wrapped by this object. The default implementation assumes that subclass contains a single RDD container, srdd, which implements the persist() and unpersist() methods.

class geopyspark.Pyramid(levels)

Contains a list of TiledRasterLayers that make up a tile pyramid. Each layer represents a level within the pyramid. This class is used when creating a tile server.

Map algebra can performed on instances of this class.

Parameters:levels (list or dict) – A list of TiledRasterLayers or a dict of TiledRasterLayers where the value is the layer itself and the key is its given zoom level.
pysc

pyspark.SparkContext – The SparkContext being used this session.

layer_type (class

~geopyspark.geotrellis.constants.LayerType): What the layer type of the geotiffs are.

levels

dict – A dict of TiledRasterLayers where the value is the layer itself and the key is its given zoom level.

max_zoom

int – The highest zoom level of the pyramid.

is_cached

bool – Signals whether or not the internal RDDs are cached. Default is False.

histogram

Histogram – The Histogram that represents the layer with the max zoomw. Will not be calculated unless the get_histogram() method is used. Otherwise, its value is None.

Raises:TypeError – If levels is neither a list or dict.
cache()

Persist this RDD with the default storage level (C{MEMORY_ONLY}).

count()

Returns how many elements are within the wrapped RDD.

Returns:The number of elements in the RDD.
Return type:Int
getNumPartitions()

Returns the number of partitions set for the wrapped RDD.

Returns:The number of partitions.
Return type:Int
get_histogram()

Calculates the Histogram for the layer with the max zoom.

Returns:Histogram
get_partition_strategy()

Returns the partitioning strategy if the layer has one.

Returns:HashPartitioner or SpatialPartitioner or SpaceTimePartitionStrategy or None
histogram
isEmpty()

Returns a bool that is True if the layer is empty and False if it is not.

Returns:Are there elements within the layer
Return type:bool
is_cached
layer_type
levels
max_zoom
persist(storageLevel=StorageLevel(False, True, False, False, 1))

Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).

pysc
unpersist()

Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.

wrapped_rdds()

Returns a list of the wrapped, Scala RDDs within each layer of the pyramid.

Returns:[org.apache.spark.rdd.RDD]
write(uri, layer_name, index_strategy=<IndexingMethod.ZORDER: 'zorder'>, time_unit=None, time_resolution=None, store=None)

Writes each tiled layer of the pyramid to a specified destination.

Parameters:
  • uri (str) – The Uniform Resource Identifier used to point towards the desired location for the tile layer to written to. The shape of this string varies depending on backend.
  • layer_name (str) – The name of the new, tile layer.
  • index_strategy (str or IndexingMethod) – The method used to organize the saved data. Depending on the type of data within the layer, only certain methods are available. Can either be a string or a IndexingMethod attribute. The default method used is, IndexingMethod.ZORDER.
  • time_unit (str or TimeUnit, optional) – Which time unit should be used when saving spatial-temporal data. This controls the resolution of each index. Meaning, what time intervals are used to separate each record. While this is set to None as default, it must be set if saving spatial-temporal data. Depending on the indexing method chosen, different time units are used.
  • time_resolution (str or int, optional) –

    Determines how data for each time_unit should be grouped together. By default, no grouping will occur.

    As an example, having a time_unit of WEEKS and a time_resolution of 5 will cause the data to be grouped and stored together in units of 5 weeks. If however time_resolution is not specified, then the data will be grouped and stored in units of single weeks.

    This value can either be an int or a string representation of an int.

  • store (str or AttributeStore, optional) – AttributeStore instance or URI for layer metadata lookup.
class geopyspark.Square(extent)
class geopyspark.Circle(radius)

A circle neighborhood.

Parameters:radius (int or float) – The radius of the circle that determines which cells fall within the bounding box.
radius

int or float – The radius of the circle that determines which cells fall within the bounding box.

param_1

float – Same as radius.

param_2

float – Unused param for Circle. Is 0.0.

param_3

float – Unused param for Circle. Is 0.0.

name

str – The name of the neighborhood which is, “circle”.

Note

Cells that lie exactly on the radius of the circle are apart of the neighborhood.

class geopyspark.Wedge(radius, start_angle, end_angle)

A wedge neighborhood.

Parameters:
  • radius (int or float) – The radius of the wedge.
  • start_angle (int or float) – The starting angle of the wedge in degrees.
  • end_angle (int or float) – The ending angle of the wedge in degrees.
radius

int or float – The radius of the wedge.

start_angle

int or float – The starting angle of the wedge in degrees.

end_angle

int or float – The ending angle of the wedge in degrees.

param_1

float – Same as radius.

param_2

float – Same as start_angle.

param_3

float – Same as end_angle.

name

str – The name of the neighborhood which is, “wedge”.

class geopyspark.Nesw(extent)

A neighborhood that includes a column and row intersection for the focus.

Parameters:extent (int or float) – The extent of this neighborhood. This represents the how many cells past the focus the bounding box goes.
extent

int or float – The extent of this neighborhood. This represents the how many cells past the focus the bounding box goes.

param_1

float – Same as extent.

param_2

float – Unused param for Nesw. Is 0.0.

param_3

float – Unused param for Nesw. Is 0.0.

name

str – The name of the neighborhood which is, “nesw”.

class geopyspark.Annulus(inner_radius, outer_radius)

An Annulus neighborhood.

Parameters:
  • inner_radius (int or float) – The radius of the inner circle.
  • outer_radius (int or float) – The radius of the outer circle.
inner_radius

int or float – The radius of the inner circle.

outer_radius

int or float – The radius of the outer circle.

param_1

float – Same as inner_radius.

param_2

float – Same as outer_radius.

param_3

float – Unused param for Annulus. Is 0.0.

name

str – The name of the neighborhood which is, “annulus”.

geopyspark.rasterize(geoms, crs, zoom, fill_value, cell_type=<CellType.FLOAT64: 'float64'>, options=None, partition_strategy=None)

Rasterizes a Shapely geometries.

Parameters:
  • geoms ([shapely.geometry] or (shapely.geometry) or pyspark.RDD[shapely.geometry]) – Either a list, tuple, or a Python RDD of shapely geometries to rasterize.
  • crs (str or int) – The CRS of the input geometry.
  • zoom (int) – The zoom level of the output raster.
  • fill_value (int or float) – Value to burn into pixels intersectiong geometry
  • cell_type (str or CellType) – Which data type the cells should be when created. Defaults to CellType.FLOAT64.
  • options (RasterizerOptions, optional) – Pixel intersection options.
  • partition_strategy (HashPartitionStrategy or SpatialPartitioinStrategy, optional) –

    Sets the Partitioner for the resulting layer and how many partitions it has. Default is, None.

    If None, then the output layer will have the default Partitioner and a number of paritions that was determined by the method.

    If partition_strategy is set but has no num_partitions, then the resulting layer will have the Partioner specified in the strategy with the with same number of partitions the source layer had.

    If partition_strategy is set and has a num_partitions, then the resulting layer will have the Partioner and number of partitions specified in the strategy.

Returns:

TiledRasterLayer

geopyspark.rasterize_features(features, crs, zoom, cell_type=<CellType.FLOAT64: 'float64'>, options=None, zindex_cell_type=<CellType.INT8: 'int8'>, partition_strategy=None)

Rasterizes a collection of Features.

Parameters:
  • features (pyspark.RDD[Feature]) – A Python RDD that contains Features.
  • crs (str or int) – The CRS of the input geometry.
  • zoom (int) –

    The zoom level of the output raster.

    Note

    Not all rasterized Features may be present in the resulting layer if the zoom is not high enough.

  • cell_type (str or CellType) – Which data type the cells should be when created. Defaults to CellType.FLOAT64.
  • options (RasterizerOptions, optional) – Pixel intersection options.
  • zindex_cell_type (str or CellType) – Which data type the Z-Index cells are. Defaults to CellType.INT8.
  • partition_strategy (HashPartitionStrategy or SpatialPartitioinStrategy, optional) –

    Sets the Partitioner for the resulting layer and how many partitions it has. Default is, None.

    If None, then the output layer will have the default Partitioner and a number of paritions that was determined by the method.

    If partition_strategy is set but has no num_partitions, then the resulting layer will have the Partioner specified in the strategy with the with same number of partitions the source layer had.

    If partition_strategy is set and has a num_partitions, then the resulting layer will have the Partioner and number of partitions specified in the strategy.

Returns:

TiledRasterLayer

class geopyspark.Credentials

Credentials for Amazon S3 buckets.

access_key

str – The access key for the S3 bucket.

secret_key

str – The secret key for the S3 bucket.

access_key

Alias for field number 0

count(value) → integer -- return number of occurrences of value
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

secret_key

Alias for field number 1

class geopyspark.TileRender(render_function)

A Python implementation of the Scala geopyspark.geotrellis.tms.TileRender interface. Permits a callback from Scala to Python to allow for custom rendering functions.

Parameters:render_function (Tile => PIL.Image.Image) – A function to convert geopyspark.geotrellis.Tile to a PIL Image.
render_function

Tile => PIL.Image.Image – A function to convert geopyspark.geotrellis.Tile to a PIL Image.

class Java
implements = ['geopyspark.geotrellis.tms.TileRender']
renderEncoded(scala_array)

A function to convert an array to an image.

Parameters:scala_array – A linear array of bytes representing the protobuf-encoded contents of a tile
Returns:bytes representing an image
requiresEncoding()
class geopyspark.TMS(server)

Provides a TMS server for raster data.

In order to display raster data on a variety of different map interfaces (e.g., leaflet maps, geojson.io, GeoNotebook, and others), we provide the TMS class.

Parameters:server (JavaObject) – The Java TMSServer instance
pysc

pyspark.SparkContext – The SparkContext being used this session.

server

JavaObject – The Java TMSServer instance

host

str – The IP address of the host, if bound, else None

port

int – The port number of the TMS server, if bound, else None

url_pattern

string – The URI pattern for the current TMS service, with {z}, {x}, {y} tokens. Can be copied directly to services such as geojson.io.

bind(host=None, requested_port=None)

Starts up a TMS server.

Parameters:
  • host (str, optional) – The target host. Typically “localhost”, “127.0.0.1”, or “0.0.0.0”. The latter will make the TMS service accessible from the world. If omitted, defaults to localhost.
  • requested_port (optional, int) – A port number to bind the service to. If omitted, use a random available port.
classmethod build(source, display, allow_overzooming=True)

Builds a TMS server from one or more layers.

This function takes a SparkContext, a source or list of sources, and a display method and creates a TMS server to display the desired content. The display method is supplied as a ColorMap (only available when there is a single source), or a callable object which takes either a single tile input (when there is a single source) or a list of tiles (for multiple sources) and returns the bytes representing an image file for that tile.

Parameters:
  • source (tuple or orlist or Pyramid) – The tile sources to render. Tuple inputs are (str, str) pairs where the first component is the URI of a catalog and the second is the layer name. A list input may be any combination of tuples and Pyramids.
  • display (ColorMap, callable) – Method for mapping tiles to images. ColorMap may only be applied to single input source. Callable will take a single numpy array for a single source, or a list of numpy arrays for multiple sources. In the case of multiple inputs, resampling may be required if the tile sources have different tile sizes. Returns bytes representing the resulting image.
  • allow_overzooming (bool) – If set, viewing at zoom levels above the highest available zoom level will produce tiles that are resampled from the highest zoom level present in the data set.
host

Returns the IP string of the server’s host if bound, else None.

Returns:(str)
port

Returns the port number for the current TMS server if bound, else None.

Returns:(int)
set_handshake(handshake)
unbind()

Shuts down the TMS service, freeing the assigned port.

url_pattern

Returns the URI for the tiles served by the present server. Contains {z}, {x}, and {y} tokens to be substituted for the desired zoom and x/y tile position.

Returns:(str)
geopyspark.union(layers)

Unions togther two or more RasterLayers or TiledRasterLayers.

All layers must have the same layer_type. If the layers are TiledRasterLayers, then all of the layers must also have the same TileLayout and CRS.

Note

If the layers to be unioned share one or more keys, then the resulting layer will contain duplicates of that key. One copy for each instance of the key.

Parameters:layers ([RasterLayer] or [TiledRasterLayer] or (RasterLayer) or (TiledRasterLayer)) – A colection of two or more RasterLayers or TiledRasterLayers layers to be unioned together.
Returns:RasterLayer or TiledRasterLayer
geopyspark.combine_bands(layers)

Combines the bands of values that share the same key in two or more TiledRasterLayers.

This method will concat the bands of two or more values with the same key. For example, layer a has values that have 2 bands and layer b has values with 1 band. When combine_bands is used on both of these layers, then the resulting layer will have values with 3 bands, 2 from layer a and 1 from layer b.

Note

All layers must have the same layer_type. If the layers are TiledRasterLayers, then all of the layers must also have the same TileLayout and CRS.

Parameters:layers ([RasterLayer] or [TiledRasterLayer] or (RasterLayer) or (TiledRasterLayer)) –

A colection of two or more RasterLayers or TiledRasterLayers. The order of the layers determines the order in which the bands are concatenated. With the bands being ordered based on the position of their respective layer.

For example, the first layer in layers is layer a which contains 2 bands and the second layer is layer b whose values have 1 band. The resulting layer will have values with 3 bands: the first 2 are from layer a and the third from layer b. If the positions of layer a and layer b are reversed, then the resulting values’ first band will be from layer b and the last 2 will be from layer a.

Returns:RasterLayer or TiledRasterLayer
class geopyspark.KeyTransform(layout, crs=None, extent=None, cellsize=None, dimensions=None)

Provides functions to move from keys to geometry and vice-versa.

Tile Layers have an underlying RDD which is keyed by either SpatialKey or SpaceTimeKey. Each key represents a region in space, depending on a choice of layout. In order to enable the conversion of keys to regions, and of geometry to keys, the KeyTransform class is provided. This class is constructed with a layout, which is either GlobalLayout, LocalLayout, or a LayoutDefinition. Global layouts use power-of-two pyramids over the world extent, while local layouts operate over a defined extent and cellsize.

NOTE: LocalLayouts will encompass the requested extent, but the final layout may include ``SpatialKey``s which only partially cover the requested extent. The upper-left corner of the resulting layout will match the requested extent, but the right and bottom edges may be beyond the boundaries of the requested extent.

NOTE: GlobalLayouts require pyproj to be installed.

:param layout(GlobalLayout or LocalLayout: or LayoutDefinition): a definition of the layout scheme defining the key structure. :param crs: Used only when layout is GlobalLayout. Target CRS of reprojection.

Either EPSG code, well-known name, or a PROJ.4 string
Parameters:
  • extent (Extent) – Used only for ``LocalLayout``s. The area of interest.
  • cellsize (tup of (float, float)) – Used only for LocalLayout``s.  The (width, height) in extent units of a pixel. Cannot be specified simultaneously with ``dimensions.
  • dimensions (tup of (int, int)) – Used only for LocalLayout``s.  The number of (columns, rows) of pixels over the entire extent.  Cannot be specified simultaneously with ``cellsize.
extent_to_keys(extent)

Returns the keys in the layout intersecting/covered by a given extent.

Parameters:extent (Extent) – The extent to find the matching keys for.
Returns:[SpatialKey]
geometry_to_keys(geom)

Returns the keys corresponding to grid cells that intersect/are covered by a given Shapely geometry.

Parameters:geom (Geometry) – The geometry to find the matching keys for.
Returns:[SpatialKey]
key_to_extent(key, *args)

Returns the Extent corresponding to a given key.

Parameters:key (SpatialKey or SpaceTimeKey or int) – The key to find the extent for. If of type int, then this parameter is the column of the key, and the call must provide a single additional int value in the args parameter to serve as the row of the key.
Returns:Extent