geopyspark.geotrellis.layer module¶

This module contains the RasterLayer and the TiledRasterLayer classes. Both of these classes are wrappers of their Scala counterparts. These will be used in leau of actual PySpark RDDs when performing operations.

class geopyspark.geotrellis.layer.RasterLayer(layer_type, srdd)¶

A wrapper of a RDD that contains GeoTrellis rasters.

Represents a layer that wraps a RDD that contains (K, V). Where K is either ProjectedExtent or TemporalProjectedExtent depending on the layer_type of the RDD, and V being a Tile.

The data held within this layer has not been tiled. Meaning the data has yet to be modified to fit a certain layout. See raster_rdd for more information.

Parameters:	layer_type (str or `LayerType`) – What the layer type of the geotiffs are. This is represented by either constants within `LayerType` or by a string. srdd (py4j.java_gateway.JavaObject) – The coresponding Scala class. This is what allows `RasterLayer` to access the various Scala methods.

pysc¶: pyspark.SparkContext – The SparkContext being used this session.

layer_type¶: LayerType – What the layer type of the geotiffs are.

srdd¶: py4j.java_gateway.JavaObject – The coresponding Scala class. This is what allows RasterLayer to access the various Scala methods.

bands(band)¶

Select a subsection of bands from the Tiles within the layer.

Note

There could be potential high performance cost if operations are performed between two sub-bands of a large data set.

Note

Due to the natue of GeoPySpark’s backend, if selecting a band that is out of bounds then the error returned will be a py4j.protocol.Py4JJavaError and not a normal Python error.

Parameters:	band (int or tuple or list or range) – The band(s) to be selected from the `Tile`s. Can either be a single int, or a collection of ints.
Returns:	`RasterLayer` with the selected bands.

cache()¶: Persist this RDD with the default storage level (C{MEMORY_ONLY}).

collect_keys()¶

Returns a list of all of the keys in the layer.

Note

This method should only be called on layers with a smaller number of keys, as a large number could cause memory issues.

Returns:	[:class:`~geopyspark.geotrellis.SpatialKey`] or [:ob:`~geopyspark.geotrellis.SpaceTimeKey`]

collect_metadata(layout=LocalLayout(tile_cols=256, tile_rows=256))¶

Iterate over the RDD records and generates layer metadata desribing the contained rasters.

:param layout (LayoutDefinition or: GlobalLayout or

LocalLayout, optional):: Target raster layout for the tiling operation.

Returns:	`Metadata`

convert_data_type(new_type, no_data_value=None)¶

Converts the underlying, raster values to a new CellType.

Parameters:	new_type (str or `CellType`) – The data type the cells should be to converted to. no_data_value (int or float, optional) – The value that should be marked as NoData.
Returns:	`RasterLayer`
Raises:	`ValueError` – If `no_data_value` is set and the `new_type` contains raw values. `ValueError` – If `no_data_value` is set and `new_type` is a boolean.

count()¶

Returns how many elements are within the wrapped RDD.

Returns:	The number of elements in the RDD.
Return type:	Int

filter_by_times(time_intervals)¶

Filters a SPACETIME layer by keeping only the values whose keys fall within a the given time interval(s).

Parameters: time_intervals ([datetime.datetime]) – A list of the time intervals to query. This list can have one or multiple elements. If just a single element, then only exact matches with that given time will be kept. If there are multiple times given, then they are each paired together so that they form ranges of time. In the case where there are an odd number of elements, then the remaining time will be treated as a single query and not a range.

Note

If nothing intersects the given time_intervals, then the returned RasterLayer will be empty.

Returns:	`RasterLayer`

classmethod from_numpy_rdd(layer_type, numpy_rdd)¶

Create a RasterLayer from a numpy RDD.

Parameters:	layer_type (str or `LayerType`) – What the layer type of the geotiffs are. This is represented by either constants within `LayerType` or by a string. numpy_rdd (pyspark.RDD) – A PySpark RDD that contains tuples of either `ProjectedExtent`s or `TemporalProjectedExtent`s and rasters that are represented by a numpy array.
Returns:	`RasterLayer`

getNumPartitions()¶

Returns the number of partitions set for the wrapped RDD.

Returns:	The number of partitions.
Return type:	Int

get_class_histogram()¶

Creates a Histogram of integer values. Suitable for classification rasters with limited number values. If only single band is present histogram is returned directly.

Returns:	`Histogram` or [`Histogram`]

get_histogram()¶

Creates a Histogram for each band in the layer. If only single band is present histogram is returned directly.

Returns:	`Histogram` or [`Histogram`]

get_min_max()¶

Returns the maximum and minimum values of all of the rasters in the layer.

Returns:	(float, float)

get_partition_strategy()¶

Returns the partitioning strategy if the layer has one.

Returns:	`HashPartitioner` or `SpatialPartitioner` or `SpaceTimePartitionStrategy` or `None`

get_quantile_breaks(num_breaks)¶

Returns quantile breaks for this Layer.

Parameters:	num_breaks (int) – The number of breaks to return.
Returns:	`[float]`

get_quantile_breaks_exact_int(num_breaks)¶

Returns quantile breaks for this Layer. This version uses the FastMapHistogram, which counts exact integer values. If your layer has too many values, this can cause memory errors.

Parameters:	num_breaks (int) – The number of breaks to return.
Returns:	`[int]`

isEmpty()¶

Returns a bool that is True if the layer is empty and False if it is not.

Returns:	Are there elements within the layer
Return type:	bool

map_cells(func)¶

Maps over the cells of each Tile within the layer with a given function.

Note

This operation first needs to deserialize the wrapped RDD into Python and then serialize the RDD back into a TiledRasterRDD once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.

Parameters:	func (cells, nd => cells) – A function that takes two arguements: `cells` and `nd`. Where `cells` is the numpy array and `nd` is the `no_data_value` of the `Tile`. It returns `cells` which are the new cells values of the `Tile` represented as a numpy array.
Returns:	`RasterLayer`

map_tiles(func)¶

Maps over each Tile within the layer with a given function.

Note

This operation first needs to deserialize the wrapped RDD into Python and then serialize the RDD back into a RasterRDD once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.

Parameters:	func (`Tile` => `Tile`) – A function that takes a `Tile` and returns a `Tile`.
Returns:	`RasterLayer`

merge(partition_strategy=None)¶

Merges the Tile of each K together to produce a single Tile.

This method will reduce each value by its key within the layer to produce a single (K, V) for every K. In order to achieve this, each Tile that shares a K is merged together to form a single Tile. This is done by replacing one Tile’s cells with another’s. Not all cells, if any, may be replaced, however. The following steps are taken to determine if a cell’s value should be replaced:

If the cell contains a NoData value, then it will be replaced.

If no NoData value is set, then a cell with a value of 0 will be replaced.

If neither of the above are true, then the cell retain its value.

Parameters:

num_partitions (int, optional) – The number of partitions that the resulting layer should be partitioned with. If None, then the num_partitions will the number of partitions the layer curretly has.
partition_strategy (HashPartitionStrategy or SpatialPartitioinStrategy or SpaceTimePartitionStrategy, optional) –
Sets the Partitioner for the resulting layer and how many partitions it has. Default is, None.

If None, then the output layer will be the same Partitioner and number of partitions as the source layer.

If partition_strategy is set but has no num_partitions, then the resulting layer will have the Partioner specified in the strategy with the with same number of partitions the source layer had.

If partition_strategy is set and has a num_partitions, then the resulting layer will have the Partioner and number of partitions specified in the strategy.

Returns:

RasterLayer

partitionBy(partition_strategy=None)¶

Repartitions the layer using the given partitioning strategy.

Parameters:

partition_strategy (HashPartitionStrategy or SpatialPartitioinStrategy or SpaceTimePartitionStrategy, optional) –

Sets the Partitioner for the resulting layer and how many partitions it has. Default is, None.

If None, then the output layer will be the same as the source layer.

If partition_strategy is set but has no num_partitions, then the resulting layer will have the Partioner specified in the strategy with the with same number of partitions the source layer had.

If partition_strategy is set and has a num_partitions, then the resulting layer will have the Partioner and number of partitions specified in the strategy.

Returns: RasterLayer

persist(storageLevel=StorageLevel(False, True, False, False, 1))¶: Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).

reclassify(value_map, data_type, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>, replace_nodata_with=None, fallback_value=None, strict=False)¶

Changes the cell values of a raster based on how the data is broken up in the given value_map.

Parameters:

value_map (dict) – A dict whose keys represent values where a break should occur and its values are the new value the cells within the break should become.
data_type (type) – The type of the values within the rasters. Can either be int or float.
classification_strategy (str or ClassificationStrategy, optional) – How the cells should be classified along the breaks. If unspecified, then ClassificationStrategy.LESS_THAN_OR_EQUAL_TO will be used.
replace_nodata_with (int or float, optional) –
When remapping values, NoData values must be treated separately. If NoData values are intended to be replaced during the reclassify, this variable should be set to the intended value. If unspecified, NoData values will be preserved.

Note

Specifying replace_nodata_with will change the value of given cells, but the NoData value of the layer will remain unchanged.
fallback_value (int or float, optional) – Represents the value that should be used when a cell’s value does not fall within the classification_strategy. Default is to use the layer’s NoData value.
strict (bool, optional) – Determines whether or not an error should be thrown if a cell’s value does not fall within the classification_strategy. Default is, False.

Returns:

RasterLayer

repartition(num_partitions=None)¶

Repartitions the layer to have a different number of partitions.

Parameters:	num_partitions (int, optional) – Desired number of partitions. Default is, `None` .If `None`, then the exisiting number of partitions will be used.
Returns:	`RasterLayer`

reproject(target_crs, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)¶

Reproject rasters to target_crs. The reproject does not sample past tile boundary.

Parameters:	target_crs (str or int) – Target CRS of reprojection. Either EPSG code, well-known name, or a PROJ.4 string. resample_method (str or `ResampleMethod`, optional) – The resample method to use for the reprojection. If none is specified, then `ResampleMethods.NEAREST_NEIGHBOR` is used.
Returns:	`RasterLayer`

tile_to_layout(layout=LocalLayout(tile_cols=256, tile_rows=256), target_crs=None, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>, partition_strategy=None)¶

Cut tiles to layout and merge overlapping tiles. This will produce unique keys.

Parameters:

layout (Metadata or TiledRasterLayer or LayoutDefinition or GlobalLayout or LocalLayout) – Target raster layout for the tiling operation.
target_crs (str or int, optional) – Target CRS of reprojection. Either EPSG code, well-known name, or a PROJ.4 string. If None, no reproject will be perfomed.
resample_method (str or ResampleMethod, optional) – The cell resample method to used during the tiling operation. Default is``ResampleMethods.NEAREST_NEIGHBOR``.
partition_strategy (HashPartitionStrategy or SpatialPartitioinStrategy or SpaceTimePartitionStrategy, optional) –
Sets the Partitioner for the resulting layer and how many partitions it has. Default is, None.

If None, then the output layer will be the same Partitioner and number of partitions as the source layer.

If partition_strategy is set but has no num_partitions, then the resulting layer will have the Partioner specified in the strategy with the with same number of partitions the source layer had.

If partition_strategy is set and has a num_partitions, then the resulting layer will have the Partioner and number of partitions specified in the strategy.

Returns:

TiledRasterLayer

to_geotiff_rdd(storage_method=<StorageMethod.STRIPED: 'Striped'>, rows_per_strip=None, tile_dimensions=(256, 256), compression=<Compression.NO_COMPRESSION: 'NoCompression'>, color_space=<ColorSpace.BLACK_IS_ZERO: 1>, color_map=None, head_tags=None, band_tags=None)¶

Converts the rasters within this layer to GeoTiffs which are then converted to bytes. This is returned as a RDD[(K, bytes)]. Where K is either ProjectedExtent or TemporalProjectedExtent.

Parameters:

storage_method (str or StorageMethod, optional) – How the segments within the GeoTiffs should be arranged. Default is StorageMethod.STRIPED.
rows_per_strip (int, optional) – How many rows should be in each strip segment of the GeoTiffs if storage_method is StorageMethod.STRIPED. If None, then the strip size will default to a value that is 8K or less.
tile_dimensions ((int, int), optional) – The length and width for each tile segment of the GeoTiff if storage_method is StorageMethod.TILED. If None then the default size is (256, 256).
compression (str or Compression, optional) – How the data should be compressed. Defaults to Compression.NO_COMPRESSION.
color_space (str or ColorSpace, optional) – How the colors should be organized in the GeoTiffs. Defaults to ColorSpace.BLACK_IS_ZERO.
color_map (ColorMap, optional) – A ColorMap instance used to color the GeoTiffs to a different gradient.
head_tags (dict, optional) – A dict where each key and value is a str.
band_tags (list, optional) – A list of dicts where each key and value is a str.
Note – For more information on the contents of the tags, see www.gdal.org/gdal_datamodel.html

Returns:

RDD[(K, bytes)]

to_numpy_rdd()¶

Converts a RasterLayer to a numpy RDD.

Note

Depending on the size of the data stored within the RDD, this can be an exspensive operation and should be used with caution.

Returns:	RDD

to_png_rdd(color_map)¶

Converts the rasters within this layer to PNGs which are then converted to bytes. This is returned as a RDD[(K, bytes)].

Parameters:	color_map (`ColorMap`) – A `ColorMap` instance used to color the PNGs.
Returns:	RDD[(K, bytes)]

to_spatial_layer(target_time=None)¶

Converts a RasterLayer with a layout_type of LayoutType.SPACETIME to a RasterLayer with a layout_type of LayoutType.SPATIAL.

Parameters:	target_time (`datetime.datetime`, optional) – The instance of interest. If set, the resulting `RasterLayer` will only contain keys that contained the given instance. If `None`, then all values within the layer will be kept.
Returns:	`RasterLayer`
Raises:	`ValueError` – If the layer already has a `layout_type` of `LayoutType.SPATIAL`.

unpersist()¶: Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.

with_no_data(no_data_value)¶

Changes the NoData value of the layer with the new given value.

It is possible to specify a NoData value for layers with raw values. The resulting layer will be of the same CellType but with a user defined NoData value. For example, if a layer has a CellType of float32raw and a no_data_value of -10 is given, then the produced layer will have a CellType of float32ud-10.0.

If the target layer has a bool CellType, then the no_data_value will be ignored and the result layer will be the same as the origin. In order to assign a NoData value to a bool layer, the convert_data_type() method must be used.

Parameters:	no_data_value (int or float) – The new `NoData` value of the layer.
Returns:	`RasterLayer`

wrapped_rdds()¶: Returns the list of RDD-containing objects wrapped by this object. The default implementation assumes that subclass contains a single RDD container, srdd, which implements the persist() and unpersist() methods.

class geopyspark.geotrellis.layer.TiledRasterLayer(layer_type, srdd)¶

Wraps a RDD of tiled, GeoTrellis rasters.

Represents a RDD that contains (K, V). Where K is either SpatialKey or SpaceTimeKey depending on the layer_type of the RDD, and V being a Tile.

The data held within the layer is tiled. This means that the rasters have been modified to fit a larger layout. For more information, see tiled-raster-rdd.

Parameters:	layer_type (str or `LayerType`) – What the layer type of the geotiffs are. This is represented by either constants within `LayerType` or by a string. srdd (py4j.java_gateway.JavaObject) – The coresponding Scala class. This is what allows `TiledRasterLayer` to access the various Scala methods.

pysc¶: pyspark.SparkContext – The SparkContext being used this session.

layer_type¶: LayerType – What the layer type of the geotiffs are.

srdd¶: py4j.java_gateway.JavaObject – The coresponding Scala class. This is what allows RasterLayer to access the various Scala methods.

is_floating_point_layer¶: bool – Whether the data within the TiledRasterLayer is floating point or not.

layer_metadata¶: Metadata – The layer metadata associated with this layer.

zoom_level¶: int – The zoom level of the layer. Can be None.

aggregate_by_cell(operation)¶

Computes an aggregate summary for each cell of all of the values for each key.

The operation given is a local map algebra function that will be applied to all values that share the same key. If there are multiple copies of the same key in the layer, then this method will reduce all instances of the (K, Tile) pairs into a single element. This resulting (K, Tile)’s Tile will contain the aggregate summaries of each cell of the reduced Tiles that had the same K.

Note

Not all Operations are supported. Only SUM, MIN, MAX, MEAN, VARIANCE, AND STANDARD_DEVIATION can be used.

Note

If calculating VARIANCE or STANDARD_DEVIATION, then any K that is a single copy will have a resulting Tile that is filled with NoData values. This is because the variance of a single element is undefined.

Parameters:	operation (str or `Operation`) – The aggregate operation to be performed.
Returns:	`TiledRasterLayer`

bands(band)¶

Select a subsection of bands from the Tiles within the layer.

Note

There could be potential high performance cost if operations are performed between two sub-bands of a large data set.

Note

Due to the natue of GeoPySpark’s backend, if selecting a band that is out of bounds then the error returned will be a py4j.protocol.Py4JJavaError and not a normal Python error.

Parameters:	band (int or tuple or list or range) – The band(s) to be selected from the `Tile`s. Can either be a single int, or a collection of ints.
Returns:	`TiledRasterLayer` with the selected bands.

cache()¶: Persist this RDD with the default storage level (C{MEMORY_ONLY}).

collect_keys()¶

Returns a list of all of the keys in the layer.

Note

This method should only be called on layers with a smaller number of keys, as a large number could cause memory issues.

Returns:	[:class:`~geopyspark.geotrellis.ProjectedExtent`] or [:class:`~geopyspark.geotrellis.TemporalProjectedExtent`]

convert_data_type(new_type, no_data_value=None)¶

Converts the underlying, raster values to a new CellType.

Parameters:	new_type (str or `CellType`) – The data type the cells should be to converted to. no_data_value (int or float, optional) – The value that should be marked as NoData.
Returns:	`TiledRasterLayer`
Raises:	`ValueError` – If `no_data_value` is set and the `new_type` contains raw values. `ValueError` – If `no_data_value` is set and `new_type` is a boolean.

count()¶

Returns how many elements are within the wrapped RDD.

Returns:	The number of elements in the RDD.
Return type:	Int

filter_by_times(time_intervals)¶

Filters a SPACETIME layer by keeping only the values whose keys fall within a the given time interval(s).

Parameters: time_intervals ([datetime.datetime]) – A list of the time intervals to query. This list can have one or multiple elements. If just a single element, then only exact matches with that given time will be kept. If there are multiple times given, then they are each paired together so that they form ranges of time. In the case where there are an odd number of elements, then the remaining time will be treated as a single query and not a range.

Note

If nothing intersects the given time_intervals, then the returned TiledRasterLayer will be empty.

Returns:	`TiledRasterLayer`

focal(operation, neighborhood=None, param_1=None, param_2=None, param_3=None, partition_strategy=None)¶

Performs the given focal operation on the layers contained in the Layer.

Parameters:

operation (str or Operation) – The focal operation to be performed.
neighborhood (str or Neighborhood, optional) – The type of neighborhood to use in the focal operation. This can be represented by either an instance of Neighborhood, or by a constant.
param_1 (int or float, optional) – The first argument of neighborhood.
param_2 (int or float, optional) – The second argument of the neighborhood.
param_3 (int or float, optional) – The third argument of the neighborhood.
partition_strategy (HashPartitionStrategy or SpatialPartitioinStrategy or SpaceTimePartitionStrategy, optional) –
Sets the Partitioner for the resulting layer and how many partitions it has. Default is, None.

If None, then the output layer will be the same Partitioner and number of partitions as the source layer.

If partition_strategy is set but has no num_partitions, then the resulting layer will have the Partioner specified in the strategy with the with same number of partitions the source layer had.

If partition_strategy is set and has a num_partitions, then the resulting layer will have the Partioner and number of partitions specified in the strategy.

Note

param only need to be set if neighborhood is not an instance of Neighborhood or if neighborhood is None.

Any param that is not set will default to 0.0.

If neighborhood is None then operation must be Operation.ASPECT.

Returns:	`TiledRasterLayer`
Raises:	`ValueError` – If `operation` is not a known operation. `ValueError` – If `neighborhood` is not a known neighborhood. `ValueError` – If `neighborhood` was not set, and `operation` is not `Operation.ASPECT`.

classmethod from_numpy_rdd(layer_type, numpy_rdd, metadata, zoom_level=None)¶

Create a TiledRasterLayer from a numpy RDD.

Parameters:

layer_type (str or LayerType) – What the layer type of the geotiffs are. This is represented by either constants within LayerType or by a string.
numpy_rdd (pyspark.RDD) – A PySpark RDD that contains tuples of either SpatialKey or SpaceTimeKey and rasters that are represented by a numpy array.
metadata (Metadata) – The Metadata of the TiledRasterLayer instance.
zoom_level (int, optional) – The zoom_level the resulting TiledRasterLayer should have. If None, then the returned layer’s zoom_level will be None.

Returns:

TiledRasterLayer

getNumPartitions()¶

Returns the number of partitions set for the wrapped RDD.

Returns:	The number of partitions.
Return type:	Int

get_class_histogram()¶

Creates a Histogram of integer values. Suitable for classification rasters with limited number values. If only single band is present histogram is returned directly.

Returns:	`Histogram` or [`Histogram`]

get_histogram()¶

Creates a Histogram for each band in the layer. If only single band is present histogram is returned directly.

Returns:	`Histogram` or [`Histogram`]

get_min_max()¶

Returns the maximum and minimum values of all of the rasters in the layer.

Returns:	(float, float)

get_partition_strategy()¶

Returns the partitioning strategy if the layer has one.

Returns:	`HashPartitioner` or `SpatialPartitioner` or `SpaceTimePartitionStrategy` or `None`

get_point_values(points, resample_method=None)¶

Returns the values of the layer at given points.

Note

Only points that are contained within a layer will be sampled. This means that if a point lies on the southern or eastern boundary of a cell, it will not be sampled.

Parameters:

or {k (points([shapely.geometry.Point]) – shapely.geometry.Point}): Either a list of, or a dictionary whose values are shapely.geometry.Points. If a dictionary, then the type of its keys does not matter. These points must be in the same projection as the tiles within the layer.
resample_method (str or ResampleMethod, optional) –
The resampling method to use before obtaining the point values. If not specified, then None is used.

Note

Not all ResampleMethods can be used to resample point values. ResampleMethod.NEAREST_NEIGHBOR, ResampleMethod.BILINEAR`, ResampleMethod.CUBIC_CONVOLUTION, and ResampleMethod.CUBIC_SPLINE are the only ones that can be used.

Returns:

The return type will vary depending on the type of points and the layer_type of the sampled layer.

If points is a list and the layer_type is SPATIAL:: [(shapely.geometry.Point, [float])]
If points is a list and the layer_type is SPACETIME:: [(shapely.geometry.Point, [(datetime.datetime, [float])])]
If points is a dict and the layer_type is SPATIAL:: {k: (shapely.geometry.Point, [float])}
If points is a dict and the layer_type is SPACETIME:: {k: (shapely.geometry.Point, [(datetime.datetime, [float])])}

The shapely.geometry.Point in all of these returns is the original sampled point given. The [float] are the sampled values, one for each band. If the layer_type was SPACETIME, then the timestamp will also be included in the results represented by a datetime.datetime instance. These times and their associated values will be given as a list of tuples for each point.

Note

The sampled values will always be returned as floats. Regardless of the cellType of the layer.

If points was given as a dict then the keys of that dictionary will be the keys in the returned dict.

get_quantile_breaks(num_breaks)¶

Returns quantile breaks for this Layer.

Parameters:	num_breaks (int) – The number of breaks to return.
Returns:	`[float]`

get_quantile_breaks_exact_int(num_breaks)¶

Returns quantile breaks for this Layer. This version uses the FastMapHistogram, which counts exact integer values. If your layer has too many values, this can cause memory errors.

Parameters:	num_breaks (int) – The number of breaks to return.
Returns:	`[int]`

isEmpty()¶

Returns a bool that is True if the layer is empty and False if it is not.

Returns:	Are there elements within the layer
Return type:	bool

local_max(value)¶

Determines the maximum value for each cell of each Tile in the layer.

This method takes a max_constant that is compared to each cell in the layer. If max_constant is larger, then the resulting cell value will be that value. Otherwise, that cell will retain its original value.

Note

NoData values are handled such that taking the max between a normal value and NoData value will always result in NoData.

Parameters:	value (int or float or `TiledRasterLayer`) – The constant value that will be compared to each cell. If this is a `TiledRasterLayer`, then `Tile`s who share a key will have each of their cell values compared.
Returns:	`TiledRasterLayer`

lookup(col, row)¶

Return the value(s) in the image of a particular SpatialKey (given by col and row).

Parameters:	col (int) – The `SpatialKey` column. row (int) – The `SpatialKey` row.
Returns:	[`Tile`]
Raises:	`ValueError` – If using lookup on a non `LayerType.SPATIAL` `TiledRasterLayer`. `IndexError` – If col and row are not within the `TiledRasterLayer`’s bounds.

map_cells(func)¶

Maps over the cells of each Tile within the layer with a given function.

Note

This operation first needs to deserialize the wrapped RDD into Python and then serialize the RDD back into a TiledRasterRDD once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.

Parameters:	func (cells, nd => cells) – A function that takes two arguements: `cells` and `nd`. Where `cells` is the numpy array and `nd` is the `no_data_value` of the tile. It returns `cells` which are the new cells values of the tile represented as a numpy array.
Returns:	`TiledRasterLayer`

map_tiles(func)¶

Maps over each Tile within the layer with a given function.

Note

This operation first needs to deserialize the wrapped RDD into Python and then serialize the RDD back into a TiledRasterRDD once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.

Parameters:	func (`Tile` => `Tile`) – A function that takes a `Tile` and returns a `Tile`.
Returns:	`TiledRasterLayer`

mask(geometries, partition_strategy=None, options=RasterizerOptions(includePartial=True, sampleType='PixelIsPoint'))¶

Masks the TiledRasterLayer so that only values that intersect the geometries will be available.

Parameters:

geometries (shapely.geometry or [shapely.geometry] or pyspark.RDD[shapely.geometry]) –
Either a single, list, or Python RDD of shapely geometry/ies to mask the layer.

Note

All geometries must be in the same CRS as the TileLayer.
partition_strategy (HashPartitionStrategy or SpatialPartitioinStrategy or SpaceTimePartitionStrategy, optional) –
Sets the Partitioner for the resulting layer and how many partitions it has. Default is, None.

If None, then the output layer will be the same as the source layer.

If partition_strategy is set but has no num_partitions, then the resulting layer will have the Partioner specified in the strategy with the with same number of partitions the source layer had.

If partition_strategy is set and has a num_partitions, then the resulting layer will have the Partioner and number of partitions specified in the strategy.

Note

This parameter will only be used if geometries is a pyspark.RDD.
options (RasterizerOptions, optional) –
During the mask operation, rasterization occurs. These options will change the pixel rasterization behavior. Default behavior is to include partial pixel intersection and to treat pixels as points.

Note

This parameter will only be used if geometries is a pyspark.RDD.

Returns:

TiledRasterLayer

merge(partition_strategy=None)¶

Merges the Tile of each K together to produce a single Tile.

This method will reduce each value by its key within the layer to produce a single (K, V) for every K. In order to achieve this, each Tile that shares a K is merged together to form a single Tile. This is done by replacing one Tile’s cells with another’s. Not all cells, if any, may be replaced, however. The following steps are taken to determine if a cell’s value should be replaced:

If the cell contains a NoData value, then it will be replaced.

If no NoData value is set, then a cell with a value of 0 will be replaced.

If neither of the above are true, then the cell retain its value.

Parameters:

num_partitions (int, optional) – The number of partitions that the resulting layer should be partitioned with. If None, then the num_partitions will the number of partitions the layer curretly has.
partition_strategy (HashPartitionStrategy or SpatialPartitioinStrategy or SpaceTimePartitionStrategy, optional) –
Sets the Partitioner for the resulting layer and how many partitions it has. Default is, None.

If None, then the output layer will be the same Partitioner and number of partitions as the source layer.

If partition_strategy is set but has no num_partitions, then the resulting layer will have the Partioner specified in the strategy with the with same number of partitions the source layer had.

If partition_strategy is set and has a num_partitions, then the resulting layer will have the Partioner and number of partitions specified in the strategy.

Returns:

TiledRasterLayer

normalize(new_min, new_max, old_min=None, old_max=None)¶

Finds the min value that is contained within the given geometry.

Note

If old_max - old_min <= 0 or new_max - new_min <= 0, then the normalization will fail.

Parameters:	old_min (int or float, optional) – Old minimum. If not given, then the minimum value of this layer will be used. old_max (int or float, optional) – Old maximum. If not given, then the minimum value of this layer will be used. new_min (int or float) – New minimum to normalize to. new_max (int or float) – New maximum to normalize to.
Returns:	`TiledRasterLayer`

partitionBy(partition_strategy=None)¶

Repartitions the layer using the given partitioning strategy.

Parameters:

partition_strategy (HashPartitionStrategy or SpatialPartitioinStrategy or SpaceTimePartitionStrategy, optional) –

Sets the Partitioner for the resulting layer and how many partitions it has. Default is, None.

If None, then the output layer will be the same as the source layer.

If partition_strategy is set but has no num_partitions, then the resulting layer will have the Partioner specified in the strategy with the with same number of partitions the source layer had.

If partition_strategy is set and has a num_partitions, then the resulting layer will have the Partioner and number of partitions specified in the strategy.

Returns: TiledRasterLayer

persist(storageLevel=StorageLevel(False, True, False, False, 1))¶: Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).

polygonal_max(geometry, data_type)¶

Finds the max value for each band that is contained within the given geometry.

Parameters:	geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A Shapely `Polygon` or `MultiPolygon` that represents the area where the summary should be computed; or a WKB representation of the geometry. data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns:	[int] or [float] depending on `data_type`.
Raises:	`TypeError` – If `data_type` is not an int or float.

polygonal_mean(geometry)¶

Finds the mean of all of the values for each band that are contained within the given geometry.

Parameters:	geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A Shapely `Polygon` or `MultiPolygon` that represents the area where the summary should be computed; or a WKB representation of the geometry.
Returns:	[float]

polygonal_min(geometry, data_type)¶

Finds the min value for each band that is contained within the given geometry.

Parameters:	geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A Shapely `Polygon` or `MultiPolygon` that represents the area where the summary should be computed; or a WKB representation of the geometry. data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns:	[int] or [float] depending on `data_type`.
Raises:	`TypeError` – If `data_type` is not an int or float.

polygonal_sum(geometry, data_type)¶

Finds the sum of all of the values in each band that are contained within the given geometry.

Parameters:	geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A Shapely `Polygon` or `MultiPolygon` that represents the area where the summary should be computed; or a WKB representation of the geometry. data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns:	[int] or [float] depending on `data_type`.
Raises:	`TypeError` – If `data_type` is not an int or float.

pyramid(resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>, partition_strategy=None)¶

Creates a layer Pyramid where the resolution is halved per level.

Parameters:	resample_method (str or `ResampleMethod`, optional) – The resample method to use when building the pyramid. Default is `ResampleMethods.NEAREST_NEIGHBOR`. partition_strategy (`HashPartitionStrategy` or `SpatialPartitioinStrategy` or `SpaceTimePartitionStrategy`, optional) – Sets the `Partitioner` for the resulting layer and how many partitions it has. Default is, `None`. If `None`, then the output layer will be the same `Partitioner` and number of partitions as the source layer. If `partition_strategy` is set but has no `num_partitions`, then the resulting layer will have the `Partioner` specified in the strategy with the with same number of partitions the source layer had. If `partition_strategy` is set and has a `num_partitions`, then the resulting layer will have the `Partioner` and number of partitions specified in the strategy.
Returns:	`Pyramid`.
Raises:	`ValueError` – If this layer layout is not of `GlobalLayout` type.

reclassify(value_map, data_type, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>, replace_nodata_with=None, fallback_value=None, strict=False)¶

Changes the cell values of a raster based on how the data is broken up in the given value_map.

Parameters:

value_map (dict) – A dict whose keys represent values where a break should occur and its values are the new value the cells within the break should become.
data_type (type) – The type of the values within the rasters. Can either be int or float.
classification_strategy (str or ClassificationStrategy, optional) – How the cells should be classified along the breaks. If unspecified, then ClassificationStrategy.LESS_THAN_OR_EQUAL_TO will be used.
replace_nodata_with (int or float, optional) –
When remapping values, NoData values must be treated separately. If NoData values are intended to be replaced during the reclassify, this variable should be set to the intended value. If unspecified, NoData values will be preserved.

Note

Specifying replace_nodata_with will change the value of given cells, but the NoData value of the layer will remain unchanged.
fallback_value (int or float, optional) – Represents the value that should be used when a cell’s value does not fall within the classification_strategy. Default is to use the layer’s NoData value.
strict (bool, optional) – Determines whether or not an error should be thrown if a cell’s value does not fall within the classification_strategy. Default is, False.

Returns:

TiledRasterLayer

repartition(num_partitions=None)¶

Repartitions the layer to have a different number of partitions.

Parameters:	num_partitions (int, optional) – Desired number of partitions. Default is, `None` .If `None`, then the exisiting number of partitions will be used.
Returns:	`TiledRasterLayer`

reproject(target_crs, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)¶

Reproject rasters to target_crs. The reproject does not sample past tile boundary.

Parameters:	target_crs (str or int) – Target CRS of reprojection. Either EPSG code, well-known name, or a PROJ.4 string. resample_method (str or `ResampleMethod`, optional) – The resample method to use for the reprojection. If none is specified, then `ResampleMethods.NEAREST_NEIGHBOR` is used.
Returns:	`TiledRasterLayer`

save_stitched(path, crop_bounds=None, crop_dimensions=None)¶

Stitch all of the rasters within the Layer into one raster and then saves it to a given path.

Parameters:

path (str) – The path of the geotiff to save. The path must be on the local file system.
crop_bounds (Extent, optional) – The sub Extent with which to crop the raster before saving. If None, then the whole raster will be saved.
crop_dimensions (tuple(int) or list(int), optional) – cols and rows of the image to save represented as either a tuple or list. If None then all cols and rows of the raster will be save.

Note

This can only be used on LayerType.SPATIAL TiledRasterLayers.

Note

If crop_dimensions is set then crop_bounds must also be set.

slope(zfactor_calculator)¶

Performs the Slope, focal operation on the first band of each Tile in the Layer.

The Slope operation will be carried out in a SQUARE neighborhood with with an extent of 1. A zfactor will be derived from the zfactor_calculator for each Tile in the Layer. The resulting Layer will have a cell_type of FLOAT64 regardless of the input Layer’s cell_type; as well as have a single band, that represents the calculated slope.

Parameters:	zfactor_calculator (py4j.JavaObject) – A `JavaObject` that represents the Scala `ZFactorCalculator` class. This can be created using either the `zfactor_lat_lng_calculator()` or the `zfactor_calculator()` methods.
Returns:	`TiledRasterLayer`

stitch()¶

Stitch all of the rasters within the Layer into one raster.

Note

This can only be used on LayerType.SPATIAL TiledRasterLayers.

Returns:	`Tile`

tile_to_layout(layout, target_crs=None, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>, partition_strategy=None)¶

Cut tiles to a given layout and merge overlapping tiles. This will produce unique keys.

Parameters:

layout (LayoutDefinition or Metadata or TiledRasterLayer or GlobalLayout or LocalLayout) – Target raster layout for the tiling operation.
target_crs (str or int, optional) – Target CRS of reprojection. Either EPSG code, well-known name, or a PROJ.4 string. If None, no reproject will be perfomed.
resample_method (str or ResampleMethod, optional) – The resample method to use for the reprojection. If none is specified, then ResampleMethods.NEAREST_NEIGHBOR is used.
partition_strategy (HashPartitionStrategy or SpatialPartitioinStrategy or SpaceTimePartitionStrategy, optional) –
Sets the Partitioner for the resulting layer and how many partitions it has. Default is, None.

If None, then the output layer will be the same Partitioner and number of partitions as the source layer.

If partition_strategy is set but has no num_partitions, then the resulting layer will have the Partioner specified in the strategy with the with same number of partitions the source layer had.

If partition_strategy is set and has a num_partitions, then the resulting layer will have the Partioner and number of partitions specified in the strategy.

Returns:

TiledRasterLayer

to_geotiff_rdd(storage_method=<StorageMethod.STRIPED: 'Striped'>, rows_per_strip=None, tile_dimensions=(256, 256), compression=<Compression.NO_COMPRESSION: 'NoCompression'>, color_space=<ColorSpace.BLACK_IS_ZERO: 1>, color_map=None, head_tags=None, band_tags=None)¶

Converts the rasters within this layer to GeoTiffs which are then converted to bytes. This is returned as a RDD[(K, bytes)]. Where K is either SpatialKey or SpaceTimeKey.

Parameters:

storage_method (str or StorageMethod, optional) – How the segments within the GeoTiffs should be arranged. Default is StorageMethod.STRIPED.
rows_per_strip (int, optional) – How many rows should be in each strip segment of the GeoTiffs if storage_method is StorageMethod.STRIPED. If None, then the strip size will default to a value that is 8K or less.
tile_dimensions ((int, int), optional) – The length and width for each tile segment of the GeoTiff if storage_method is StorageMethod.TILED. If None then the default size is (256, 256).
compression (str or Compression, optional) – How the data should be compressed. Defaults to Compression.NO_COMPRESSION.
color_space (str or ColorSpace, optional) – How the colors should be organized in the GeoTiffs. Defaults to ColorSpace.BLACK_IS_ZERO.
color_map (ColorMap, optional) – A ColorMap instance used to color the GeoTiffs to a different gradient.
head_tags (dict, optional) – A dict where each key and value is a str.
band_tags (list, optional) – A list of dicts where each key and value is a str.
Note – For more information on the contents of the tags, see www.gdal.org/gdal_datamodel.html

Returns:

RDD[(K, bytes)]

to_numpy_rdd()¶

Converts a TiledRasterLayer to a numpy RDD.

Note

Depending on the size of the data stored within the RDD, this can be an exspensive operation and should be used with caution.

Returns:	RDD

to_png_rdd(color_map)¶

Converts the rasters within this layer to PNGs which are then converted to bytes. This is returned as a RDD[(K, bytes)].

Parameters:	color_map (`ColorMap`) – A `ColorMap` instance used to color the PNGs.
Returns:	RDD[(K, bytes)]

to_spatial_layer(target_time=None)¶

Converts a TiledRasterLayer with a layout_type of LayoutType.SPACETIME to a TiledRasterLayer with a layout_type of LayoutType.SPATIAL.

Parameters:	target_time (`datetime.datetime`, optional) – The instance of interest. If set, the resulting `TiledRasterLayer` will only contain keys that contained the given instance. If `None`, then all values within the layer will be kept.
Returns:	`TiledRasterLayer`
Raises:	`ValueError` – If the layer already has a `layout_type` of `LayoutType.SPATIAL`.

tobler()¶

Generates a Tobler walking speed layer from an elevation layer.

Note

This method has a known issue where the Tobler calculation is direction agnostic. Thus, all slopes are assumed to be uphill. This can result it incorrect results. A fix is currently being worked on.

Returns:	`TiledRasterLayer`

unpersist()¶: Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.

with_no_data(no_data_value)¶

Changes the NoData value of the layer with the new given value.

It is possible to specify a NoData value for layers with raw values. The resulting layer will be of the same CellType but with a user defined NoData value. For example, if a layer has a CellType of float32raw and a no_data_value of -10 is given, then the produced layer will have a CellType of float32ud-10.0.

If the target layer has a bool CellType, then the no_data_value will be ignored and the result layer will be the same as the origin. In order to assign a NoData value to a bool layer, the convert_data_type() method must be used.

Parameters:	no_data_value (int or float) – The new `NoData` value of the layer.
Returns:	`TiledRasterLayer`

wrapped_rdds()¶: Returns the list of RDD-containing objects wrapped by this object. The default implementation assumes that subclass contains a single RDD container, srdd, which implements the persist() and unpersist() methods.

class geopyspark.geotrellis.layer.Pyramid(levels)¶

Contains a list of TiledRasterLayers that make up a tile pyramid. Each layer represents a level within the pyramid. This class is used when creating a tile server.

Map algebra can performed on instances of this class.

Parameters:	levels (list or dict) – A list of `TiledRasterLayer`s or a dict of `TiledRasterLayer`s where the value is the layer itself and the key is its given zoom level.

pysc¶: pyspark.SparkContext – The SparkContext being used this session.

layer_type (class: ~geopyspark.geotrellis.constants.LayerType): What the layer type of the geotiffs are.

levels¶: dict – A dict of TiledRasterLayers where the value is the layer itself and the key is its given zoom level.

max_zoom¶: int – The highest zoom level of the pyramid.

is_cached¶: bool – Signals whether or not the internal RDDs are cached. Default is False.

histogram¶: Histogram – The Histogram that represents the layer with the max zoomw. Will not be calculated unless the get_histogram() method is used. Otherwise, its value is None.

Raises:	`TypeError` – If `levels` is neither a list or dict.

cache()¶: Persist this RDD with the default storage level (C{MEMORY_ONLY}).

count()¶

Returns how many elements are within the wrapped RDD.

Returns:	The number of elements in the RDD.
Return type:	Int

getNumPartitions()¶

Returns the number of partitions set for the wrapped RDD.

Returns:	The number of partitions.
Return type:	Int

get_histogram()¶

Calculates the Histogram for the layer with the max zoom.

Returns:	`Histogram`

get_partition_strategy()¶

Returns the partitioning strategy if the layer has one.

Returns:	`HashPartitioner` or `SpatialPartitioner` or `SpaceTimePartitionStrategy` or `None`

isEmpty()¶

Returns a bool that is True if the layer is empty and False if it is not.

Returns:	Are there elements within the layer
Return type:	bool

persist(storageLevel=StorageLevel(False, True, False, False, 1))¶: Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).

unpersist()¶: Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.

wrapped_rdds()¶

Returns a list of the wrapped, Scala RDDs within each layer of the pyramid.

Returns:	[org.apache.spark.rdd.RDD]