geopyspark.geotrellis.layer module¶
This module contains the RasterLayer
and the TiledRasterLayer
classes. Both of these
classes are wrappers of their Scala counterparts. These will be used in leau of actual PySpark RDDs
when performing operations.
-
class
geopyspark.geotrellis.layer.
RasterLayer
(layer_type, srdd)¶ A wrapper of a RDD that contains GeoTrellis rasters.
Represents a layer that wraps a RDD that contains
(K, V)
. WhereK
is eitherProjectedExtent
orTemporalProjectedExtent
depending on thelayer_type
of the RDD, andV
being aTile
.The data held within this layer has not been tiled. Meaning the data has yet to be modified to fit a certain layout. See raster_rdd for more information.
Parameters: - layer_type (str or
LayerType
) – What the layer type of the geotiffs are. This is represented by either constants withinLayerType
or by a string. - srdd (py4j.java_gateway.JavaObject) – The coresponding Scala class. This is what allows
RasterLayer
to access the various Scala methods.
-
pysc
¶ pyspark.SparkContext – The
SparkContext
being used this session.
-
srdd
¶ py4j.java_gateway.JavaObject – The coresponding Scala class. This is what allows
RasterLayer
to access the various Scala methods.
-
bands
(band)¶ Select a subsection of bands from the
Tile
s within the layer.Note
There could be potential high performance cost if operations are performed between two sub-bands of a large data set.
Note
Due to the natue of GeoPySpark’s backend, if selecting a band that is out of bounds then the error returned will be a
py4j.protocol.Py4JJavaError
and not a normal Python error.Parameters: band (int or tuple or list or range) – The band(s) to be selected from the Tile
s. Can either be a single int, or a collection of ints.Returns: RasterLayer
with the selected bands.
-
cache
()¶ Persist this RDD with the default storage level (C{MEMORY_ONLY}).
-
collect_keys
()¶ Returns a list of all of the keys in the layer.
Note
This method should only be called on layers with a smaller number of keys, as a large number could cause memory issues.
Returns: [:class:`~geopyspark.geotrellis.SpatialKey`]
or[:ob:`~geopyspark.geotrellis.SpaceTimeKey`]
-
collect_metadata
(layout=LocalLayout(tile_cols=256, tile_rows=256))¶ Iterate over the RDD records and generates layer metadata desribing the contained rasters.
- :param layout (
LayoutDefinition
or:GlobalLayout
or LocalLayout
, optional):- Target raster layout for the tiling operation.
Returns: Metadata
- :param layout (
-
convert_data_type
(new_type, no_data_value=None)¶ Converts the underlying, raster values to a new
CellType
.Parameters: - new_type (str or
CellType
) – The data type the cells should be to converted to. - no_data_value (int or float, optional) – The value that should be marked as NoData.
Returns: Raises: ValueError
– Ifno_data_value
is set and thenew_type
contains raw values.ValueError
– Ifno_data_value
is set andnew_type
is a boolean.
- new_type (str or
-
count
()¶ Returns how many elements are within the wrapped RDD.
Returns: The number of elements in the RDD. Return type: Int
-
filter_by_times
(time_intervals)¶ Filters a
SPACETIME
layer by keeping only the values whose keys fall within a the given time interval(s).Parameters: time_intervals ( [datetime.datetime]
) – A list of the time intervals to query. This list can have one or multiple elements. If just a single element, then only exact matches with that given time will be kept. If there are multiple times given, then they are each paired together so that they form ranges of time. In the case where there are an odd number of elements, then the remaining time will be treated as a single query and not a range.Note
If nothing intersects the given
time_intervals
, then the returnedRasterLayer
will be empty.Returns: RasterLayer
-
classmethod
from_numpy_rdd
(layer_type, numpy_rdd)¶ Create a
RasterLayer
from a numpy RDD.Parameters: - layer_type (str or
LayerType
) – What the layer type of the geotiffs are. This is represented by either constants withinLayerType
or by a string. - numpy_rdd (pyspark.RDD) – A PySpark RDD that contains tuples of either
ProjectedExtent
s orTemporalProjectedExtent
s and rasters that are represented by a numpy array.
Returns: - layer_type (str or
-
getNumPartitions
()¶ Returns the number of partitions set for the wrapped RDD.
Returns: The number of partitions. Return type: Int
-
get_class_histogram
()¶ Creates a
Histogram
of integer values. Suitable for classification rasters with limited number values. If only single band is present histogram is returned directly.Returns: Histogram
or [Histogram
]
-
get_histogram
()¶ Creates a
Histogram
for each band in the layer. If only single band is present histogram is returned directly.Returns: Histogram
or [Histogram
]
-
get_min_max
()¶ Returns the maximum and minimum values of all of the rasters in the layer.
Returns: (float, float)
-
get_partition_strategy
()¶ Returns the partitioning strategy if the layer has one.
Returns: HashPartitioner
orSpatialPartitioner
orSpaceTimePartitionStrategy
orNone
-
get_quantile_breaks
(num_breaks)¶ Returns quantile breaks for this Layer.
Parameters: num_breaks (int) – The number of breaks to return. Returns: [float]
-
get_quantile_breaks_exact_int
(num_breaks)¶ Returns quantile breaks for this Layer. This version uses the
FastMapHistogram
, which counts exact integer values. If your layer has too many values, this can cause memory errors.Parameters: num_breaks (int) – The number of breaks to return. Returns: [int]
-
isEmpty
()¶ Returns a bool that is True if the layer is empty and False if it is not.
Returns: Are there elements within the layer Return type: bool
-
map_cells
(func)¶ Maps over the cells of each
Tile
within the layer with a given function.Note
This operation first needs to deserialize the wrapped
RDD
into Python and then serialize theRDD
back into aTiledRasterRDD
once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.Parameters: func (cells, nd => cells) – A function that takes two arguements: cells
andnd
. Wherecells
is the numpy array andnd
is theno_data_value
of theTile
. It returnscells
which are the new cells values of theTile
represented as a numpy array.Returns: RasterLayer
-
map_tiles
(func)¶ Maps over each
Tile
within the layer with a given function.Note
This operation first needs to deserialize the wrapped
RDD
into Python and then serialize theRDD
back into aRasterRDD
once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.Parameters: func ( Tile
=>Tile
) – A function that takes aTile
and returns aTile
.Returns: RasterLayer
-
merge
(partition_strategy=None)¶ Merges the
Tile
of eachK
together to produce a singleTile
.This method will reduce each value by its key within the layer to produce a single
(K, V)
for everyK
. In order to achieve this, eachTile
that shares aK
is merged together to form a singleTile
. This is done by replacing oneTile
’s cells with another’s. Not all cells, if any, may be replaced, however. The following steps are taken to determine if a cell’s value should be replaced:- If the cell contains a
NoData
value, then it will be replaced. - If no
NoData
value is set, then a cell with a value of 0 will be replaced. - If neither of the above are true, then the cell retain its value.
Parameters: - num_partitions (int, optional) – The number of partitions that the resulting
layer should be partitioned with. If
None
, then thenum_partitions
will the number of partitions the layer curretly has. - partition_strategy (
HashPartitionStrategy
orSpatialPartitioinStrategy
orSpaceTimePartitionStrategy
, optional) –Sets the
Partitioner
for the resulting layer and how many partitions it has. Default is,None
.If
None
, then the output layer will be the samePartitioner
and number of partitions as the source layer.If
partition_strategy
is set but has nonum_partitions
, then the resulting layer will have thePartioner
specified in the strategy with the with same number of partitions the source layer had.If
partition_strategy
is set and has anum_partitions
, then the resulting layer will have thePartioner
and number of partitions specified in the strategy.
Returns: - If the cell contains a
-
partitionBy
(partition_strategy=None)¶ Repartitions the layer using the given partitioning strategy.
Parameters: partition_strategy ( HashPartitionStrategy
orSpatialPartitioinStrategy
orSpaceTimePartitionStrategy
, optional) –Sets the
Partitioner
for the resulting layer and how many partitions it has. Default is,None
.If
None
, then the output layer will be the same as the source layer.If
partition_strategy
is set but has nonum_partitions
, then the resulting layer will have thePartioner
specified in the strategy with the with same number of partitions the source layer had.If
partition_strategy
is set and has anum_partitions
, then the resulting layer will have thePartioner
and number of partitions specified in the strategy.Returns: RasterLayer
-
persist
(storageLevel=StorageLevel(False, True, False, False, 1))¶ Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).
-
reclassify
(value_map, data_type, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>, replace_nodata_with=None, fallback_value=None, strict=False)¶ Changes the cell values of a raster based on how the data is broken up in the given
value_map
.Parameters: - value_map (dict) – A
dict
whose keys represent values where a break should occur and its values are the new value the cells within the break should become. - data_type (type) – The type of the values within the rasters. Can either be int or float.
- classification_strategy (str or
ClassificationStrategy
, optional) – How the cells should be classified along the breaks. If unspecified, thenClassificationStrategy.LESS_THAN_OR_EQUAL_TO
will be used. - replace_nodata_with (int or float, optional) –
When remapping values,
NoData
values must be treated separately. IfNoData
values are intended to be replaced during the reclassify, this variable should be set to the intended value. If unspecified,NoData
values will be preserved.Note
Specifying
replace_nodata_with
will change the value of given cells, but theNoData
value of the layer will remain unchanged. - fallback_value (int or float, optional) – Represents the value that should be used
when a cell’s value does not fall within the
classification_strategy
. Default is to use the layer’sNoData
value. - strict (bool, optional) – Determines whether or not an error should be thrown if
a cell’s value does not fall within the
classification_strategy
. Default is,False
.
Returns: - value_map (dict) – A
-
repartition
(num_partitions=None)¶ Repartitions the layer to have a different number of partitions.
Parameters: num_partitions (int, optional) – Desired number of partitions. Default is, None
.IfNone
, then the exisiting number of partitions will be used.Returns: RasterLayer
-
reproject
(target_crs, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)¶ Reproject rasters to
target_crs
. The reproject does not sample past tile boundary.Parameters: - target_crs (str or int) – Target CRS of reprojection. Either EPSG code, well-known name, or a PROJ.4 string.
- resample_method (str or
ResampleMethod
, optional) – The resample method to use for the reprojection. If none is specified, thenResampleMethods.NEAREST_NEIGHBOR
is used.
Returns:
-
tile_to_layout
(layout=LocalLayout(tile_cols=256, tile_rows=256), target_crs=None, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>, partition_strategy=None)¶ Cut tiles to layout and merge overlapping tiles. This will produce unique keys.
Parameters: - layout (
Metadata
orTiledRasterLayer
orLayoutDefinition
orGlobalLayout
orLocalLayout
) – Target raster layout for the tiling operation. - target_crs (str or int, optional) – Target CRS of reprojection. Either EPSG code,
well-known name, or a PROJ.4 string. If
None
, no reproject will be perfomed. - resample_method (str or
ResampleMethod
, optional) – The cell resample method to used during the tiling operation. Default is``ResampleMethods.NEAREST_NEIGHBOR``. - partition_strategy (
HashPartitionStrategy
orSpatialPartitioinStrategy
orSpaceTimePartitionStrategy
, optional) –Sets the
Partitioner
for the resulting layer and how many partitions it has. Default is,None
.If
None
, then the output layer will be the samePartitioner
and number of partitions as the source layer.If
partition_strategy
is set but has nonum_partitions
, then the resulting layer will have thePartioner
specified in the strategy with the with same number of partitions the source layer had.If
partition_strategy
is set and has anum_partitions
, then the resulting layer will have thePartioner
and number of partitions specified in the strategy.
Returns: - layout (
-
to_geotiff_rdd
(storage_method=<StorageMethod.STRIPED: 'Striped'>, rows_per_strip=None, tile_dimensions=(256, 256), compression=<Compression.NO_COMPRESSION: 'NoCompression'>, color_space=<ColorSpace.BLACK_IS_ZERO: 1>, color_map=None, head_tags=None, band_tags=None)¶ Converts the rasters within this layer to GeoTiffs which are then converted to bytes. This is returned as a
RDD[(K, bytes)]
. WhereK
is eitherProjectedExtent
orTemporalProjectedExtent
.Parameters: - storage_method (str or
StorageMethod
, optional) – How the segments within the GeoTiffs should be arranged. Default isStorageMethod.STRIPED
. - rows_per_strip (int, optional) – How many rows should be in each strip segment of the
GeoTiffs if
storage_method
isStorageMethod.STRIPED
. IfNone
, then the strip size will default to a value that is 8K or less. - tile_dimensions ((int, int), optional) – The length and width for each tile segment of the GeoTiff
if
storage_method
isStorageMethod.TILED
. IfNone
then the default size is(256, 256)
. - compression (str or
Compression
, optional) – How the data should be compressed. Defaults toCompression.NO_COMPRESSION
. - color_space (str or
ColorSpace
, optional) – How the colors should be organized in the GeoTiffs. Defaults toColorSpace.BLACK_IS_ZERO
. - color_map (
ColorMap
, optional) – AColorMap
instance used to color the GeoTiffs to a different gradient. - head_tags (dict, optional) – A
dict
where each key and value is astr
. - band_tags (list, optional) – A
list
ofdict
s where each key and value is astr
. - Note – For more information on the contents of the tags, see www.gdal.org/gdal_datamodel.html
Returns: RDD[(K, bytes)]
- storage_method (str or
-
to_numpy_rdd
()¶ Converts a
RasterLayer
to a numpy RDD.Note
Depending on the size of the data stored within the RDD, this can be an exspensive operation and should be used with caution.
Returns: RDD
-
to_png_rdd
(color_map)¶ Converts the rasters within this layer to PNGs which are then converted to bytes. This is returned as a RDD[(K, bytes)].
Parameters: color_map ( ColorMap
) – AColorMap
instance used to color the PNGs.Returns: RDD[(K, bytes)]
-
to_spatial_layer
(target_time=None)¶ Converts a
RasterLayer
with alayout_type
ofLayoutType.SPACETIME
to aRasterLayer
with alayout_type
ofLayoutType.SPATIAL
.Parameters: target_time ( datetime.datetime
, optional) – The instance of interest. If set, the resultingRasterLayer
will only contain keys that contained the given instance. IfNone
, then all values within the layer will be kept.Returns: RasterLayer
Raises: ValueError
– If the layer already has alayout_type
ofLayoutType.SPATIAL
.
-
unpersist
()¶ Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
-
with_no_data
(no_data_value)¶ Changes the
NoData
value of the layer with the new given value.It is possible to specify a
NoData
value for layers with raw values. The resulting layer will be of the sameCellType
but with a user definedNoData
value. For example, if a layer has aCellType
offloat32raw
and ano_data_value
of-10
is given, then the produced layer will have aCellType
offloat32ud-10.0
.If the target layer has a
bool
CellType
, then theno_data_value
will be ignored and the result layer will be the same as the origin. In order to assign aNoData
value to abool
layer, theconvert_data_type()
method must be used.Parameters: no_data_value (int or float) – The new NoData
value of the layer.Returns: RasterLayer
-
wrapped_rdds
()¶ Returns the list of RDD-containing objects wrapped by this object. The default implementation assumes that subclass contains a single RDD container, srdd, which implements the persist() and unpersist() methods.
- layer_type (str or
-
class
geopyspark.geotrellis.layer.
TiledRasterLayer
(layer_type, srdd)¶ Wraps a RDD of tiled, GeoTrellis rasters.
Represents a RDD that contains
(K, V)
. WhereK
is eitherSpatialKey
orSpaceTimeKey
depending on thelayer_type
of the RDD, andV
being aTile
.The data held within the layer is tiled. This means that the rasters have been modified to fit a larger layout. For more information, see tiled-raster-rdd.
Parameters: - layer_type (str or
LayerType
) – What the layer type of the geotiffs are. This is represented by either constants withinLayerType
or by a string. - srdd (py4j.java_gateway.JavaObject) – The coresponding Scala class. This is what allows
TiledRasterLayer
to access the various Scala methods.
-
pysc
¶ pyspark.SparkContext – The
SparkContext
being used this session.
-
srdd
¶ py4j.java_gateway.JavaObject – The coresponding Scala class. This is what allows
RasterLayer
to access the various Scala methods.
-
is_floating_point_layer
¶ bool – Whether the data within the
TiledRasterLayer
is floating point or not.
-
zoom_level
¶ int – The zoom level of the layer. Can be
None
.
-
aggregate_by_cell
(operation)¶ Computes an aggregate summary for each cell of all of the values for each key.
The
operation
given is a local map algebra function that will be applied to all values that share the same key. If there are multiple copies of the same key in the layer, then this method will reduce all instances of the(K, Tile)
pairs into a single element. This resulting(K, Tile)
’sTile
will contain the aggregate summaries of each cell of the reducedTile
s that had the sameK
.Note
Not all
Operation
s are supported. OnlySUM
,MIN
,MAX
,MEAN
,VARIANCE
, ANDSTANDARD_DEVIATION
can be used.Note
If calculating
VARIANCE
orSTANDARD_DEVIATION
, then anyK
that is a single copy will have a resultingTile
that is filled withNoData
values. This is because the variance of a single element is undefined.Parameters: operation (str or Operation
) – The aggregate operation to be performed.Returns: TiledRasterLayer
-
bands
(band)¶ Select a subsection of bands from the
Tile
s within the layer.Note
There could be potential high performance cost if operations are performed between two sub-bands of a large data set.
Note
Due to the natue of GeoPySpark’s backend, if selecting a band that is out of bounds then the error returned will be a
py4j.protocol.Py4JJavaError
and not a normal Python error.Parameters: band (int or tuple or list or range) – The band(s) to be selected from the Tile
s. Can either be a single int, or a collection of ints.Returns: TiledRasterLayer
with the selected bands.
-
cache
()¶ Persist this RDD with the default storage level (C{MEMORY_ONLY}).
-
collect_keys
()¶ Returns a list of all of the keys in the layer.
Note
This method should only be called on layers with a smaller number of keys, as a large number could cause memory issues.
Returns: [:class:`~geopyspark.geotrellis.ProjectedExtent`]
or[:class:`~geopyspark.geotrellis.TemporalProjectedExtent`]
-
convert_data_type
(new_type, no_data_value=None)¶ Converts the underlying, raster values to a new
CellType
.Parameters: - new_type (str or
CellType
) – The data type the cells should be to converted to. - no_data_value (int or float, optional) – The value that should be marked as NoData.
Returns: Raises: ValueError
– Ifno_data_value
is set and thenew_type
contains raw values.ValueError
– Ifno_data_value
is set andnew_type
is a boolean.
- new_type (str or
-
count
()¶ Returns how many elements are within the wrapped RDD.
Returns: The number of elements in the RDD. Return type: Int
-
filter_by_times
(time_intervals)¶ Filters a
SPACETIME
layer by keeping only the values whose keys fall within a the given time interval(s).Parameters: time_intervals ( [datetime.datetime]
) – A list of the time intervals to query. This list can have one or multiple elements. If just a single element, then only exact matches with that given time will be kept. If there are multiple times given, then they are each paired together so that they form ranges of time. In the case where there are an odd number of elements, then the remaining time will be treated as a single query and not a range.Note
If nothing intersects the given
time_intervals
, then the returnedTiledRasterLayer
will be empty.Returns: TiledRasterLayer
-
focal
(operation, neighborhood=None, param_1=None, param_2=None, param_3=None, partition_strategy=None)¶ Performs the given focal operation on the layers contained in the Layer.
Parameters: - operation (str or
Operation
) – The focal operation to be performed. - neighborhood (str or
Neighborhood
, optional) – The type of neighborhood to use in the focal operation. This can be represented by either an instance ofNeighborhood
, or by a constant. - param_1 (int or float, optional) – The first argument of
neighborhood
. - param_2 (int or float, optional) – The second argument of the
neighborhood
. - param_3 (int or float, optional) – The third argument of the
neighborhood
. - partition_strategy (
HashPartitionStrategy
orSpatialPartitioinStrategy
orSpaceTimePartitionStrategy
, optional) –Sets the
Partitioner
for the resulting layer and how many partitions it has. Default is,None
.If
None
, then the output layer will be the samePartitioner
and number of partitions as the source layer.If
partition_strategy
is set but has nonum_partitions
, then the resulting layer will have thePartioner
specified in the strategy with the with same number of partitions the source layer had.If
partition_strategy
is set and has anum_partitions
, then the resulting layer will have thePartioner
and number of partitions specified in the strategy.
Note
param
only need to be set ifneighborhood
is not an instance ofNeighborhood
or ifneighborhood
isNone
.Any
param
that is not set will default to 0.0.If
neighborhood
isNone
thenoperation
must beOperation.ASPECT
.Returns: Raises: ValueError
– Ifoperation
is not a known operation.ValueError
– Ifneighborhood
is not a known neighborhood.ValueError
– Ifneighborhood
was not set, andoperation
is notOperation.ASPECT
.
- operation (str or
-
classmethod
from_numpy_rdd
(layer_type, numpy_rdd, metadata, zoom_level=None)¶ Create a
TiledRasterLayer
from a numpy RDD.Parameters: - layer_type (str or
LayerType
) – What the layer type of the geotiffs are. This is represented by either constants withinLayerType
or by a string. - numpy_rdd (pyspark.RDD) – A PySpark RDD that contains tuples of either
SpatialKey
orSpaceTimeKey
and rasters that are represented by a numpy array. - metadata (
Metadata
) – TheMetadata
of theTiledRasterLayer
instance. - zoom_level (int, optional) – The
zoom_level
the resulting TiledRasterLayer should have. IfNone
, then the returned layer’szoom_level
will beNone
.
Returns: - layer_type (str or
-
getNumPartitions
()¶ Returns the number of partitions set for the wrapped RDD.
Returns: The number of partitions. Return type: Int
-
get_class_histogram
()¶ Creates a
Histogram
of integer values. Suitable for classification rasters with limited number values. If only single band is present histogram is returned directly.Returns: Histogram
or [Histogram
]
-
get_histogram
()¶ Creates a
Histogram
for each band in the layer. If only single band is present histogram is returned directly.Returns: Histogram
or [Histogram
]
-
get_min_max
()¶ Returns the maximum and minimum values of all of the rasters in the layer.
Returns: (float, float)
-
get_partition_strategy
()¶ Returns the partitioning strategy if the layer has one.
Returns: HashPartitioner
orSpatialPartitioner
orSpaceTimePartitionStrategy
orNone
-
get_point_values
(points, resample_method=None)¶ Returns the values of the layer at given points.
Note
Only points that are contained within a layer will be sampled. This means that if a point lies on the southern or eastern boundary of a cell, it will not be sampled.
Parameters: - or {k (points([shapely.geometry.Point]) – shapely.geometry.Point}):
Either a list of, or a dictionary whose values are
shapely.geometry.Point
s. If a dictionary, then the type of its keys does not matter. These points must be in the same projection as the tiles within the layer. - resample_method (str or
ResampleMethod
, optional) –The resampling method to use before obtaining the point values. If not specified, then
None
is used.Note
Not all
ResampleMethod
s can be used to resample point values.ResampleMethod.NEAREST_NEIGHBOR
,ResampleMethod.BILINEAR`
,ResampleMethod.CUBIC_CONVOLUTION
, andResampleMethod.CUBIC_SPLINE
are the only ones that can be used.
Returns: The return type will vary depending on the type of
points
and thelayer_type
of the sampled layer.- If
points
is alist
and thelayer_type
isSPATIAL
: [(shapely.geometry.Point, [float])]
- If
points
is alist
and thelayer_type
isSPACETIME
: [(shapely.geometry.Point, [(datetime.datetime, [float])])]
- If
points
is adict
and thelayer_type
isSPATIAL
: {k: (shapely.geometry.Point, [float])}
- If
points
is adict
and thelayer_type
isSPACETIME
: {k: (shapely.geometry.Point, [(datetime.datetime, [float])])}
The
shapely.geometry.Point
in all of these returns is the original sampled point given. The[float]
are the sampled values, one for each band. If thelayer_type
wasSPACETIME
, then the timestamp will also be included in the results represented by adatetime.datetime
instance. These times and their associated values will be given as a list of tuples for each point.Note
The sampled values will always be returned as
float
s. Regardless of thecellType
of the layer.If
points
was given as adict
then the keys of that dictionary will be the keys in the returneddict
.- or {k (points([shapely.geometry.Point]) – shapely.geometry.Point}):
Either a list of, or a dictionary whose values are
-
get_quantile_breaks
(num_breaks)¶ Returns quantile breaks for this Layer.
Parameters: num_breaks (int) – The number of breaks to return. Returns: [float]
-
get_quantile_breaks_exact_int
(num_breaks)¶ Returns quantile breaks for this Layer. This version uses the
FastMapHistogram
, which counts exact integer values. If your layer has too many values, this can cause memory errors.Parameters: num_breaks (int) – The number of breaks to return. Returns: [int]
-
isEmpty
()¶ Returns a bool that is True if the layer is empty and False if it is not.
Returns: Are there elements within the layer Return type: bool
-
local_max
(value)¶ Determines the maximum value for each cell of each
Tile
in the layer.This method takes a
max_constant
that is compared to each cell in the layer. Ifmax_constant
is larger, then the resulting cell value will be that value. Otherwise, that cell will retain its original value.Note
NoData
values are handled such that taking the max between a normal value andNoData
value will always result inNoData
.Parameters: value (int or float or TiledRasterLayer
) – The constant value that will be compared to each cell. If this is aTiledRasterLayer
, thenTile
s who share a key will have each of their cell values compared.Returns: TiledRasterLayer
-
lookup
(col, row)¶ Return the value(s) in the image of a particular
SpatialKey
(given by col and row).Parameters: - col (int) – The
SpatialKey
column. - row (int) – The
SpatialKey
row.
Returns: [
Tile
]Raises: ValueError
– If using lookup on a nonLayerType.SPATIAL
TiledRasterLayer
.IndexError
– If col and row are not within theTiledRasterLayer
’s bounds.
- col (int) – The
-
map_cells
(func)¶ Maps over the cells of each
Tile
within the layer with a given function.Note
This operation first needs to deserialize the wrapped
RDD
into Python and then serialize theRDD
back into aTiledRasterRDD
once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.Parameters: func (cells, nd => cells) – A function that takes two arguements: cells
andnd
. Wherecells
is the numpy array andnd
is theno_data_value
of the tile. It returnscells
which are the new cells values of the tile represented as a numpy array.Returns: TiledRasterLayer
-
map_tiles
(func)¶ Maps over each
Tile
within the layer with a given function.Note
This operation first needs to deserialize the wrapped
RDD
into Python and then serialize theRDD
back into aTiledRasterRDD
once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.Parameters: func ( Tile
=>Tile
) – A function that takes aTile
and returns aTile
.Returns: TiledRasterLayer
-
mask
(geometries, partition_strategy=None, options=RasterizerOptions(includePartial=True, sampleType='PixelIsPoint'))¶ Masks the
TiledRasterLayer
so that only values that intersect the geometries will be available.Parameters: - geometries (shapely.geometry or [shapely.geometry] or pyspark.RDD[shapely.geometry]) –
Either a single, list, or Python
RDD
of shapely geometry/ies to mask the layer.Note
All geometries must be in the same CRS as the TileLayer.
- partition_strategy (
HashPartitionStrategy
orSpatialPartitioinStrategy
orSpaceTimePartitionStrategy
, optional) –Sets the
Partitioner
for the resulting layer and how many partitions it has. Default is,None
.If
None
, then the output layer will be the same as the source layer.If
partition_strategy
is set but has nonum_partitions
, then the resulting layer will have thePartioner
specified in the strategy with the with same number of partitions the source layer had.If
partition_strategy
is set and has anum_partitions
, then the resulting layer will have thePartioner
and number of partitions specified in the strategy.Note
This parameter will only be used if
geometries
is apyspark.RDD
. - options (
RasterizerOptions
, optional) –During the mask operation, rasterization occurs. These options will change the pixel rasterization behavior. Default behavior is to include partial pixel intersection and to treat pixels as points.
Note
This parameter will only be used if
geometries
is apyspark.RDD
.
Returns: - geometries (shapely.geometry or [shapely.geometry] or pyspark.RDD[shapely.geometry]) –
-
merge
(partition_strategy=None)¶ Merges the
Tile
of eachK
together to produce a singleTile
.This method will reduce each value by its key within the layer to produce a single
(K, V)
for everyK
. In order to achieve this, eachTile
that shares aK
is merged together to form a singleTile
. This is done by replacing oneTile
’s cells with another’s. Not all cells, if any, may be replaced, however. The following steps are taken to determine if a cell’s value should be replaced:- If the cell contains a
NoData
value, then it will be replaced. - If no
NoData
value is set, then a cell with a value of 0 will be replaced. - If neither of the above are true, then the cell retain its value.
Parameters: - num_partitions (int, optional) – The number of partitions that the resulting
layer should be partitioned with. If
None
, then thenum_partitions
will the number of partitions the layer curretly has. - partition_strategy (
HashPartitionStrategy
orSpatialPartitioinStrategy
orSpaceTimePartitionStrategy
, optional) –Sets the
Partitioner
for the resulting layer and how many partitions it has. Default is,None
.If
None
, then the output layer will be the samePartitioner
and number of partitions as the source layer.If
partition_strategy
is set but has nonum_partitions
, then the resulting layer will have thePartioner
specified in the strategy with the with same number of partitions the source layer had.If
partition_strategy
is set and has anum_partitions
, then the resulting layer will have thePartioner
and number of partitions specified in the strategy.
Returns: - If the cell contains a
-
normalize
(new_min, new_max, old_min=None, old_max=None)¶ Finds the min value that is contained within the given geometry.
Note
If
old_max - old_min <= 0
ornew_max - new_min <= 0
, then the normalization will fail.Parameters: - old_min (int or float, optional) – Old minimum. If not given, then the minimum value of this layer will be used.
- old_max (int or float, optional) – Old maximum. If not given, then the minimum value of this layer will be used.
- new_min (int or float) – New minimum to normalize to.
- new_max (int or float) – New maximum to normalize to.
Returns:
-
partitionBy
(partition_strategy=None)¶ Repartitions the layer using the given partitioning strategy.
Parameters: partition_strategy ( HashPartitionStrategy
orSpatialPartitioinStrategy
orSpaceTimePartitionStrategy
, optional) –Sets the
Partitioner
for the resulting layer and how many partitions it has. Default is,None
.If
None
, then the output layer will be the same as the source layer.If
partition_strategy
is set but has nonum_partitions
, then the resulting layer will have thePartioner
specified in the strategy with the with same number of partitions the source layer had.If
partition_strategy
is set and has anum_partitions
, then the resulting layer will have thePartioner
and number of partitions specified in the strategy.Returns: TiledRasterLayer
-
persist
(storageLevel=StorageLevel(False, True, False, False, 1))¶ Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).
-
polygonal_max
(geometry, data_type)¶ Finds the max value for each band that is contained within the given geometry.
Parameters: - geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
Polygon
orMultiPolygon
that represents the area where the summary should be computed; or a WKB representation of the geometry. - data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns: [int] or [float] depending on
data_type
.Raises: TypeError
– Ifdata_type
is not an int or float.- geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
-
polygonal_mean
(geometry)¶ Finds the mean of all of the values for each band that are contained within the given geometry.
Parameters: geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A Shapely Polygon
orMultiPolygon
that represents the area where the summary should be computed; or a WKB representation of the geometry.Returns: [float]
-
polygonal_min
(geometry, data_type)¶ Finds the min value for each band that is contained within the given geometry.
Parameters: - geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
Polygon
orMultiPolygon
that represents the area where the summary should be computed; or a WKB representation of the geometry. - data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns: [int] or [float] depending on
data_type
.Raises: TypeError
– Ifdata_type
is not an int or float.- geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
-
polygonal_sum
(geometry, data_type)¶ Finds the sum of all of the values in each band that are contained within the given geometry.
Parameters: - geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
Polygon
orMultiPolygon
that represents the area where the summary should be computed; or a WKB representation of the geometry. - data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns: [int] or [float] depending on
data_type
.Raises: TypeError
– Ifdata_type
is not an int or float.- geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
-
pyramid
(resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>, partition_strategy=None)¶ Creates a layer
Pyramid
where the resolution is halved per level.Parameters: - resample_method (str or
ResampleMethod
, optional) – The resample method to use when building the pyramid. Default isResampleMethods.NEAREST_NEIGHBOR
. - partition_strategy (
HashPartitionStrategy
orSpatialPartitioinStrategy
orSpaceTimePartitionStrategy
, optional) –Sets the
Partitioner
for the resulting layer and how many partitions it has. Default is,None
.If
None
, then the output layer will be the samePartitioner
and number of partitions as the source layer.If
partition_strategy
is set but has nonum_partitions
, then the resulting layer will have thePartioner
specified in the strategy with the with same number of partitions the source layer had.If
partition_strategy
is set and has anum_partitions
, then the resulting layer will have thePartioner
and number of partitions specified in the strategy.
Returns: Raises: ValueError
– If this layer layout is not ofGlobalLayout
type.- resample_method (str or
-
reclassify
(value_map, data_type, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>, replace_nodata_with=None, fallback_value=None, strict=False)¶ Changes the cell values of a raster based on how the data is broken up in the given
value_map
.Parameters: - value_map (dict) – A
dict
whose keys represent values where a break should occur and its values are the new value the cells within the break should become. - data_type (type) – The type of the values within the rasters. Can either be int or float.
- classification_strategy (str or
ClassificationStrategy
, optional) – How the cells should be classified along the breaks. If unspecified, thenClassificationStrategy.LESS_THAN_OR_EQUAL_TO
will be used. - replace_nodata_with (int or float, optional) –
When remapping values,
NoData
values must be treated separately. IfNoData
values are intended to be replaced during the reclassify, this variable should be set to the intended value. If unspecified,NoData
values will be preserved.Note
Specifying
replace_nodata_with
will change the value of given cells, but theNoData
value of the layer will remain unchanged. - fallback_value (int or float, optional) – Represents the value that should be used
when a cell’s value does not fall within the
classification_strategy
. Default is to use the layer’sNoData
value. - strict (bool, optional) – Determines whether or not an error should be thrown if
a cell’s value does not fall within the
classification_strategy
. Default is,False
.
Returns: - value_map (dict) – A
-
repartition
(num_partitions=None)¶ Repartitions the layer to have a different number of partitions.
Parameters: num_partitions (int, optional) – Desired number of partitions. Default is, None
.IfNone
, then the exisiting number of partitions will be used.Returns: TiledRasterLayer
-
reproject
(target_crs, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)¶ Reproject rasters to
target_crs
. The reproject does not sample past tile boundary.Parameters: - target_crs (str or int) – Target CRS of reprojection. Either EPSG code, well-known name, or a PROJ.4 string.
- resample_method (str or
ResampleMethod
, optional) – The resample method to use for the reprojection. If none is specified, thenResampleMethods.NEAREST_NEIGHBOR
is used.
Returns:
-
save_stitched
(path, crop_bounds=None, crop_dimensions=None)¶ Stitch all of the rasters within the Layer into one raster and then saves it to a given path.
Parameters: - path (str) – The path of the geotiff to save. The path must be on the local file system.
- crop_bounds (
Extent
, optional) – The subExtent
with which to crop the raster before saving. IfNone
, then the whole raster will be saved. - crop_dimensions (tuple(int) or list(int), optional) – cols and rows of the image to save
represented as either a tuple or list. If
None
then all cols and rows of the raster will be save.
Note
This can only be used on
LayerType.SPATIAL
TiledRasterLayer
s.Note
If
crop_dimensions
is set thencrop_bounds
must also be set.
-
slope
(zfactor_calculator)¶ Performs the Slope, focal operation on the first band of each
Tile
in the Layer.The Slope operation will be carried out in a
SQUARE
neighborhood with with anextent
of 1. Azfactor
will be derived from thezfactor_calculator
for eachTile
in the Layer. The resulting Layer will have acell_type
ofFLOAT64
regardless of the input Layer’scell_type
; as well as have a single band, that represents the calculated slope.Parameters: zfactor_calculator (py4j.JavaObject) – A JavaObject
that represents the ScalaZFactorCalculator
class. This can be created using either thezfactor_lat_lng_calculator()
or thezfactor_calculator()
methods.Returns: TiledRasterLayer
-
stitch
()¶ Stitch all of the rasters within the Layer into one raster.
Note
This can only be used on
LayerType.SPATIAL
TiledRasterLayer
s.Returns: Tile
-
tile_to_layout
(layout, target_crs=None, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>, partition_strategy=None)¶ Cut tiles to a given layout and merge overlapping tiles. This will produce unique keys.
Parameters: - layout (
LayoutDefinition
orMetadata
orTiledRasterLayer
orGlobalLayout
orLocalLayout
) – Target raster layout for the tiling operation. - target_crs (str or int, optional) – Target CRS of reprojection. Either EPSG code,
well-known name, or a PROJ.4 string. If
None
, no reproject will be perfomed. - resample_method (str or
ResampleMethod
, optional) – The resample method to use for the reprojection. If none is specified, thenResampleMethods.NEAREST_NEIGHBOR
is used. - partition_strategy (
HashPartitionStrategy
orSpatialPartitioinStrategy
orSpaceTimePartitionStrategy
, optional) –Sets the
Partitioner
for the resulting layer and how many partitions it has. Default is,None
.If
None
, then the output layer will be the samePartitioner
and number of partitions as the source layer.If
partition_strategy
is set but has nonum_partitions
, then the resulting layer will have thePartioner
specified in the strategy with the with same number of partitions the source layer had.If
partition_strategy
is set and has anum_partitions
, then the resulting layer will have thePartioner
and number of partitions specified in the strategy.
Returns: - layout (
-
to_geotiff_rdd
(storage_method=<StorageMethod.STRIPED: 'Striped'>, rows_per_strip=None, tile_dimensions=(256, 256), compression=<Compression.NO_COMPRESSION: 'NoCompression'>, color_space=<ColorSpace.BLACK_IS_ZERO: 1>, color_map=None, head_tags=None, band_tags=None)¶ Converts the rasters within this layer to GeoTiffs which are then converted to bytes. This is returned as a
RDD[(K, bytes)]
. WhereK
is eitherSpatialKey
orSpaceTimeKey
.Parameters: - storage_method (str or
StorageMethod
, optional) – How the segments within the GeoTiffs should be arranged. Default isStorageMethod.STRIPED
. - rows_per_strip (int, optional) – How many rows should be in each strip segment of the
GeoTiffs if
storage_method
isStorageMethod.STRIPED
. IfNone
, then the strip size will default to a value that is 8K or less. - tile_dimensions ((int, int), optional) – The length and width for each tile segment of the GeoTiff
if
storage_method
isStorageMethod.TILED
. IfNone
then the default size is(256, 256)
. - compression (str or
Compression
, optional) – How the data should be compressed. Defaults toCompression.NO_COMPRESSION
. - color_space (str or
ColorSpace
, optional) – How the colors should be organized in the GeoTiffs. Defaults toColorSpace.BLACK_IS_ZERO
. - color_map (
ColorMap
, optional) – AColorMap
instance used to color the GeoTiffs to a different gradient. - head_tags (dict, optional) – A
dict
where each key and value is astr
. - band_tags (list, optional) – A
list
ofdict
s where each key and value is astr
. - Note – For more information on the contents of the tags, see www.gdal.org/gdal_datamodel.html
Returns: RDD[(K, bytes)]
- storage_method (str or
-
to_numpy_rdd
()¶ Converts a
TiledRasterLayer
to a numpy RDD.Note
Depending on the size of the data stored within the RDD, this can be an exspensive operation and should be used with caution.
Returns: RDD
-
to_png_rdd
(color_map)¶ Converts the rasters within this layer to PNGs which are then converted to bytes. This is returned as a RDD[(K, bytes)].
Parameters: color_map ( ColorMap
) – AColorMap
instance used to color the PNGs.Returns: RDD[(K, bytes)]
-
to_spatial_layer
(target_time=None)¶ Converts a
TiledRasterLayer
with alayout_type
ofLayoutType.SPACETIME
to aTiledRasterLayer
with alayout_type
ofLayoutType.SPATIAL
.Parameters: target_time ( datetime.datetime
, optional) – The instance of interest. If set, the resultingTiledRasterLayer
will only contain keys that contained the given instance. IfNone
, then all values within the layer will be kept.Returns: TiledRasterLayer
Raises: ValueError
– If the layer already has alayout_type
ofLayoutType.SPATIAL
.
-
tobler
()¶ Generates a Tobler walking speed layer from an elevation layer.
Note
This method has a known issue where the Tobler calculation is direction agnostic. Thus, all slopes are assumed to be uphill. This can result it incorrect results. A fix is currently being worked on.
Returns: TiledRasterLayer
-
unpersist
()¶ Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
-
with_no_data
(no_data_value)¶ Changes the
NoData
value of the layer with the new given value.It is possible to specify a
NoData
value for layers with raw values. The resulting layer will be of the sameCellType
but with a user definedNoData
value. For example, if a layer has aCellType
offloat32raw
and ano_data_value
of-10
is given, then the produced layer will have aCellType
offloat32ud-10.0
.If the target layer has a
bool
CellType
, then theno_data_value
will be ignored and the result layer will be the same as the origin. In order to assign aNoData
value to abool
layer, theconvert_data_type()
method must be used.Parameters: no_data_value (int or float) – The new NoData
value of the layer.Returns: TiledRasterLayer
-
wrapped_rdds
()¶ Returns the list of RDD-containing objects wrapped by this object. The default implementation assumes that subclass contains a single RDD container, srdd, which implements the persist() and unpersist() methods.
- layer_type (str or
-
class
geopyspark.geotrellis.layer.
Pyramid
(levels)¶ Contains a list of
TiledRasterLayer
s that make up a tile pyramid. Each layer represents a level within the pyramid. This class is used when creating a tile server.Map algebra can performed on instances of this class.
Parameters: levels (list or dict) – A list of TiledRasterLayer
s or a dict ofTiledRasterLayer
s where the value is the layer itself and the key is its given zoom level.-
pysc
¶ pyspark.SparkContext – The
SparkContext
being used this session.
-
layer_type (class
~geopyspark.geotrellis.constants.LayerType): What the layer type of the geotiffs are.
-
levels
¶ dict – A dict of
TiledRasterLayer
s where the value is the layer itself and the key is its given zoom level.
-
max_zoom
¶ int – The highest zoom level of the pyramid.
-
is_cached
¶ bool – Signals whether or not the internal RDDs are cached. Default is
False
.
-
histogram
¶ Histogram
– TheHistogram
that represents the layer with the max zoomw. Will not be calculated unless theget_histogram()
method is used. Otherwise, its value isNone
.
Raises: TypeError
– Iflevels
is neither a list or dict.-
cache
()¶ Persist this RDD with the default storage level (C{MEMORY_ONLY}).
-
count
()¶ Returns how many elements are within the wrapped RDD.
Returns: The number of elements in the RDD. Return type: Int
-
getNumPartitions
()¶ Returns the number of partitions set for the wrapped RDD.
Returns: The number of partitions. Return type: Int
-
get_partition_strategy
()¶ Returns the partitioning strategy if the layer has one.
Returns: HashPartitioner
orSpatialPartitioner
orSpaceTimePartitionStrategy
orNone
-
isEmpty
()¶ Returns a bool that is True if the layer is empty and False if it is not.
Returns: Are there elements within the layer Return type: bool
-
persist
(storageLevel=StorageLevel(False, True, False, False, 1))¶ Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).
-
unpersist
()¶ Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
-
wrapped_rdds
()¶ Returns a list of the wrapped, Scala RDDs within each layer of the pyramid.
Returns: [org.apache.spark.rdd.RDD]
-