geopyspark.geotrellis package¶
This subpackage contains the code that reads, writes, and processes data using GeoTrellis.
-
class
geopyspark.geotrellis.
Bounds
(minKey, maxKey)¶ Represents the grid that covers the area of the rasters in a RDD on a grid.
Parameters: - minKey (SpatialKey or SpaceTimeKey) – The smallest
SpatialKey
orSpaceTimeKey
. - maxKey (SpatialKey or SpaceTimeKey) – The largest
SpatialKey
orSpaceTimeKey
.
Returns: -
count
(value) → integer -- return number of occurrences of value¶
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
maxKey
¶ Alias for field number 1
-
minKey
¶ Alias for field number 0
- minKey (SpatialKey or SpaceTimeKey) – The smallest
-
class
geopyspark.geotrellis.
Extent
¶ The “bounding box” or geographic region of an area on Earth a raster represents.
Parameters: - xmin (float) – The minimum x coordinate.
- ymin (float) – The minimum y coordinate.
- xmax (float) – The maximum x coordinate.
- ymax (float) – The maximum y coordinate.
-
xmin
¶ float – The minimum x coordinate.
-
ymin
¶ float – The minimum y coordinate.
-
xmax
¶ float – The maximum x coordinate.
-
ymax
¶ float – The maximum y coordinate.
-
count
(value) → integer -- return number of occurrences of value¶
-
classmethod
from_polygon
(polygon)¶ Creates a new instance of
Extent
from a Shapely Polygon.The new
Extent
will contain the min and max coordinates of the Polygon; regardless of the Polygon’s shape.Parameters: polygon (shapely.geometry.Polygon) – A Shapely Polygon. Returns: Extent
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
to_polygon
¶ Converts this instance to a Shapely Polygon.
The resulting Polygon will be in the shape of a box.
Returns: shapely.geometry.Polygon
-
xmax
Alias for field number 2
-
xmin
Alias for field number 0
-
ymax
Alias for field number 3
-
ymin
Alias for field number 1
-
class
geopyspark.geotrellis.
LayoutDefinition
(extent, tileLayout)¶ Describes the layout of the rasters within a RDD and how they are projected.
Parameters: - extent (
Extent
) – TheExtent
of the layout. - tileLayout (
TileLayout
) – TheTileLayout
of how the rasters within the RDD.
Returns: -
count
(value) → integer -- return number of occurrences of value¶
-
extent
¶ Alias for field number 0
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
tileLayout
¶ Alias for field number 1
- extent (
-
class
geopyspark.geotrellis.
Metadata
(bounds, crs, cell_type, extent, layout_definition)¶ Information of the values within a
RasterRDD
orTiledRasterRDD
. This data pertains to the layout and other attributes of the data within the classes.Parameters: - bounds (
Bounds
) – TheBounds
of the values in the class. - crs (str or int) – The
CRS
of the data. Can either be the EPSG code, well-known name, or a PROJ.4 projection string. - cell_type (str) – The data type of the cells of the rasters.
- extent (
Extent
) – TheExtent
that covers the all of the rasters. - layout_definition (
LayoutDefinition
) – TheLayoutDefinition
of all rasters.
-
crs
¶ str or int – The CRS of the data. Can either be the EPSG code, well-known name, or a PROJ.4 projection string.
-
cell_type
¶ str – The data type of the cells of the rasters.
-
tile_layout
¶ TileLayout
– TheTileLayout
that describes how the rasters are orginized.
-
layout_definition
¶ LayoutDefinition
– TheLayoutDefinition
of all rasters.
-
classmethod
from_dict
(metadata_dict)¶ Creates
Metadata
from a dictionary.Parameters: metadata_dict (dict) – The Metadata
of aRasterRDD
orTiledRasterRDD
instance that is indict
form.Returns: Metadata
-
to_dict
()¶ Converts this instance to a
dict
.Returns: dict
- bounds (
-
class
geopyspark.geotrellis.
TileLayout
(layoutCols, layoutRows, tileCols, tileRows)¶ Describes the grid in which the rasters within a RDD should be laid out.
Parameters: - layoutCols (int) – The number of columns of rasters that runs east to west.
- layoutRows (int) – The number of rows of rasters that runs north to south.
- tileCols (int) – The number of columns of pixels in each raster that runs east to west.
- tileRows (int) – The number of rows of pixels in each raster that runs north to south.
Returns: -
count
(value) → integer -- return number of occurrences of value¶
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
layoutCols
¶ Alias for field number 0
-
layoutRows
¶ Alias for field number 1
-
tileCols
¶ Alias for field number 2
-
tileRows
¶ Alias for field number 3
geopyspark.geotrellis.catalog module¶
Methods for reading, querying, and saving tile layers to and from GeoTrellis Catalogs.
-
geopyspark.geotrellis.catalog.
get_layer_ids
(geopysc, uri, options=None, **kwargs)¶ Returns a list of all of the layer ids in the selected catalog as dicts that contain the name and zoom of a given layer.
Parameters: - geopysc (geopyspark.GeoPyContext) – The
GeoPyContext
being used this session. - uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
- options (dict, optional) – Additional parameters for reading the layer for specific backends. The dictionary is only used for Cassandra and HBase, no other backend requires this to be set.
- **kwargs – The optional parameters can also be set as keywords arguments. The keywords must be in camel case. If both options and keywords are set, then the options will be used.
Returns: [layerIds]
- Where
layerIds
is adict
with the following fields: - name (str): The name of the layer
- zoom (int): The zoom level of the given layer.
- geopysc (geopyspark.GeoPyContext) – The
-
geopyspark.geotrellis.catalog.
query
(geopysc, rdd_type, uri, layer_name, layer_zoom, intersects, time_intervals=None, proj_query=None, options=None, numPartitions=None, **kwargs)¶ Queries a single, zoom layer from a GeoTrellis catalog given spatial and/or time parameters. Unlike read, this method will only return part of the layer that intersects the specified region.
Note
The whole layer could still be read in if
intersects
and/ortime_intervals
have not been set, or if the querried region contains the entire layer.Parameters: - geopysc (GeoPyContext) – The GeoPyContext being used this session.
- rdd_type (str) – What the spatial type of the geotiffs are. This is
represented by the constants:
SPATIAL
andSPACETIME
. Note: All of the GeoTiffs must have the same saptial type. - uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
- layer_name (str) – The name of the GeoTrellis catalog to be querried.
- layer_zoom (int) – The zoom level of the layer that is to be querried.
- intersects (str or Polygon or
Extent
) –The desired spatial area to be returned. Can either be a string, a shapely Polygon, or an instance of
Extent
. If the value is a string, it must be the WKT string, geometry format.- The types of Polygons supported:
- Point
- Polygon
- MultiPolygon
Note
Only layers that were made from spatial, singleband GeoTiffs can query a Point. All other types are restricted to Polygon and MulitPolygon.
- time_intervals (list, optional) – A list of strings that time intervals to query. The strings must be in a valid date-time format. This parameter is only used when querying spatial-temporal data. The default value is, None. If None, then only the spatial area will be querried.
- options (dict, optional) – Additional parameters for querying the tile for specific backends.
The dictioanry is only used for
Cassandra
andHBase
, no other backend requires this to be set. - numPartitions (int, optional) – Sets RDD partition count when reading from catalog.
- **kwargs – The optional parameters can also be set as keywords arguements. The keywords must be in camel case. If both options and keywords are set, then the options will be used.
Returns:
-
geopyspark.geotrellis.catalog.
read
(geopysc, rdd_type, uri, layer_name, layer_zoom, options=None, numPartitions=None, **kwargs)¶ Reads a single, zoom layer from a GeoTrellis catalog.
Note
This will read the entire layer. If only part of the layer is needed, use
query()
instead.Parameters: - geopysc (GeoPyContext) – The GeoPyContext being used this session.
- rdd_type (str) – What the spatial type of the geotiffs are. This is
represented by the constants:
SPATIAL
andSPACETIME
. - uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
- layer_name (str) – The name of the GeoTrellis catalog to be read from.
- layer_zoom (int) – The zoom level of the layer that is to be read.
- options (dict, optional) – Additional parameters for reading the layer for specific backends.
The dictionary is only used for
Cassandra
andHBase
, no other backend requires this to be set. - numPartitions (int, optional) – Sets RDD partition count when reading from catalog.
- **kwargs – The optional parameters can also be set as keywords arguments. The keywords must be in camel case. If both options and keywords are set, then the options will be used.
Returns:
-
geopyspark.geotrellis.catalog.
read_layer_metadata
(geopysc, rdd_type, uri, layer_name, layer_zoom, options=None, **kwargs)¶ Reads the metadata from a saved layer without reading in the whole layer.
Parameters: - geopysc (geopyspark.GeoPyContext) – The
GeoPyContext
being used this session. - rdd_type (str) – What the spatial type of the geotiffs are. This is
represented by the constants:
SPATIAL
andSPACETIME
. - uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
- layer_name (str) – The name of the GeoTrellis catalog to be read from.
- layer_zoom (int) – The zoom level of the layer that is to be read.
- options (dict, optional) – Additional parameters for reading the layer for specific backends.
The dictionary is only used for
Cassandra
andHBase
, no other backend requires this to be set. - numPartitions (int, optional) – Sets RDD partition count when reading from catalog.
- **kwargs – The optional parameters can also be set as keywords arguments. The keywords must be in camel case. If both options and keywords are set, then the options will be used.
Returns: - geopysc (geopyspark.GeoPyContext) – The
-
geopyspark.geotrellis.catalog.
read_value
(geopysc, rdd_type, uri, layer_name, layer_zoom, col, row, zdt=None, options=None, **kwargs)¶ Reads a single tile from a GeoTrellis catalog. Unlike other functions in this module, this will not return a
TiledRasterRDD
, but rather a GeoPySpark formatted raster. This is the function to use when creating a tile server.Note
When requesting a tile that does not exist,
None
will be returned.Parameters: - geopysc (geopyspark.GeoPyContext) – The
GeoPyContext
being used this session. - rdd_type (str) – What the spatial type of the geotiffs are. This is
represented by the constants:
SPATIAL
andSPACETIME
. - uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
- layer_name (str) – The name of the GeoTrellis catalog to be read from.
- layer_zoom (int) – The zoom level of the layer that is to be read.
- col (int) – The col number of the tile within the layout. Cols run east to west.
- row (int) – The row number of the tile within the layout. Row run north to south.
- zdt (str) – The Zone-Date-Time string of the tile. The string must be in a valid date-time format. This parameter is only used when querying spatial-temporal data. The default value is, None. If None, then only the spatial area will be queried.
- options (dict, optional) – Additional parameters for reading the tile for specific backends.
The dictionary is only used for
Cassandra
andHBase
, no other backend requires this to be set. - **kwargs – The optional parameters can also be set as keywords arguments. The keywords must be in camel case. If both options and keywords are set, then the options will be used.
Returns: Raster or
None
- geopysc (geopyspark.GeoPyContext) – The
-
geopyspark.geotrellis.catalog.
write
(uri, layer_name, tiled_raster_rdd, index_strategy='zorder', time_unit=None, options=None, **kwargs)¶ Writes a tile layer to a specified destination.
Parameters: - uri (str) – The Uniform Resource Identifier used to point towards the desired location for the tile layer to written to. The shape of this string varies depending on backend.
- layer_name (str) – The name of the new, tile layer.
- layer_zoom (int) – The zoom level the layer should be saved at.
- tiled_raster_rdd (
TiledRasterRDD
) – TheTiledRasterRDD
to be saved. - index_strategy (str) – The method used to orginize the saved data. Depending on the type of
data within the layer, only certain methods are available. The default method used is,
ZORDER
. - time_unit (str, optional) – Which time unit should be used when saving spatial-temporal data. While this is set to None as default, it must be set if saving spatial-temporal data. Depending on the indexing method chosen, different time units are used.
- options (dict, optional) – Additional parameters for writing the layer for specific
backends. The dictioanry is only used for
Cassandra
andHBase
, no other backend requires this to be set. - **kwargs – The optional parameters can also be set as keywords arguements. The keywords must be in camel case. If both options and keywords are set, then the options will be used.
geopyspark.geotrellis.constants module¶
Constants that are used by geopyspark.geotrellis
classes, methods, and functions.
-
geopyspark.geotrellis.constants.
ANNULUS
= 'annulus'¶ Neighborhood type.
-
geopyspark.geotrellis.constants.
ASPECT
= 'Aspect'¶ Focal operation type.
-
geopyspark.geotrellis.constants.
AVERAGE
= 'Average'¶ A resampling method.
-
geopyspark.geotrellis.constants.
BILINEAR
= 'Bilinear'¶ A resampling method.
-
geopyspark.geotrellis.constants.
BLUE_TO_ORANGE
= 'BlueToOrange'¶ A ColorRamp.
-
geopyspark.geotrellis.constants.
BLUE_TO_RED
= 'BlueToRed'¶ A ColorRamp.
-
geopyspark.geotrellis.constants.
BOOL
= 'bool'¶ Representes Byte Cells with constant NoData values.
-
geopyspark.geotrellis.constants.
BOOLRAW
= 'boolraw'¶ Representes Byte Cells.
-
geopyspark.geotrellis.constants.
CELL_TYPES
= ['boolraw', 'int8raw', 'uint8raw', 'int16raw', 'uint16raw', 'int32raw', 'float32raw', 'float64raw', 'bool', 'int8', 'uint8', 'int16', 'uint16', 'int32', 'float32', 'float64', 'int8ud', 'uint8ud', 'int16ud', 'uint16ud', 'int32ud', 'float32ud', 'float64ud']¶ A ColorRamp.
-
geopyspark.geotrellis.constants.
CIRCLE
= 'circle'¶ Focal operation type.
-
geopyspark.geotrellis.constants.
CLASSIFICATION_BOLD_LAND_USE
= 'ClassificationBoldLandUse'¶ A ColorRamp.
-
geopyspark.geotrellis.constants.
COOLWARM
= 'coolwarm'¶ A ColorRamp.
-
geopyspark.geotrellis.constants.
CUBICCONVOLUTION
= 'CubicConvolution'¶ A resampling method.
-
geopyspark.geotrellis.constants.
CUBICSPLINE
= 'CubicSpline'¶ A resampling method.
-
geopyspark.geotrellis.constants.
DAYS
= 'days'¶ A time unit used with ZORDER.
-
geopyspark.geotrellis.constants.
EXACT
= 'Exact'¶ Representes Bit Cells.
-
geopyspark.geotrellis.constants.
FLOAT
= 'float'¶ A key indexing method. Works for RDD that contain both SpatialKey and SpaceTimeKey.
-
geopyspark.geotrellis.constants.
FLOAT32
= 'float32'¶ Representes Double Cells with constant NoData values.
-
geopyspark.geotrellis.constants.
FLOAT32RAW
= 'float32raw'¶ Representes Double Cells.
-
geopyspark.geotrellis.constants.
FLOAT32UD
= 'float32ud'¶ Representes Double Cells with user defined NoData values.
-
geopyspark.geotrellis.constants.
FLOAT64
= 'float64'¶ Representes Byte Cells with user defined NoData values.
-
geopyspark.geotrellis.constants.
FLOAT64RAW
= 'float64raw'¶ Representes Bit Cells.
-
geopyspark.geotrellis.constants.
GREATERTHAN
= 'GreaterThan'¶ A classification strategy.
-
geopyspark.geotrellis.constants.
GREATERTHANOREQUALTO
= 'GreaterThanOrEqualTo'¶ A classification strategy.
-
geopyspark.geotrellis.constants.
GREEN_TO_RED_ORANGE
= 'GreenToRedOrange'¶ A ColorRamp.
-
geopyspark.geotrellis.constants.
HEATMAP_BLUE_TO_YELLOW_TO_RED_SPECTRUM
= 'HeatmapBlueToYellowToRedSpectrum'¶ A ColorRamp.
-
geopyspark.geotrellis.constants.
HEATMAP_DARK_RED_TO_YELLOW_WHITE
= 'HeatmapDarkRedToYellowWhite'¶ A ColorRamp.
-
geopyspark.geotrellis.constants.
HEATMAP_LIGHT_PURPLE_TO_DARK_PURPLE_TO_WHITE
= 'HeatmapLightPurpleToDarkPurpleToWhite'¶ A ColorRamp.
-
geopyspark.geotrellis.constants.
HEATMAP_YELLOW_TO_RED
= 'HeatmapYellowToRed'¶ A ColorRamp.
-
geopyspark.geotrellis.constants.
HILBERT
= 'hilbert'¶ A key indexing method. Works only for RDDs that contain SpatialKey. This method provides the fastest lookup of all the key indexing method, however, it does not give good locality guarantees. It is recommended then that this method should only be used when locality is not important for your analysis.
-
geopyspark.geotrellis.constants.
HOT
= 'hot'¶ A ColorRamp.
-
geopyspark.geotrellis.constants.
HOURS
= 'hours'¶ A time unit used with ZORDER.
-
geopyspark.geotrellis.constants.
INFERNO
= 'inferno'¶ A ColorRamp.
-
geopyspark.geotrellis.constants.
INT16
= 'int16'¶ Representes UShort Cells with constant NoData values.
-
geopyspark.geotrellis.constants.
INT16RAW
= 'int16raw'¶ Representes UShort Cells.
-
geopyspark.geotrellis.constants.
INT16UD
= 'int16ud'¶ Representes UShort Cells with user defined NoData values.
-
geopyspark.geotrellis.constants.
INT32
= 'int32'¶ Representes Float Cells with constant NoData values.
-
geopyspark.geotrellis.constants.
INT32RAW
= 'int32raw'¶ Representes Float Cells.
-
geopyspark.geotrellis.constants.
INT32UD
= 'int32ud'¶ Representes Float Cells with user defined NoData values.
-
geopyspark.geotrellis.constants.
INT8
= 'int8'¶ Representes UByte Cells with constant NoData values.
-
geopyspark.geotrellis.constants.
INT8RAW
= 'int8raw'¶ Representes UByte Cells.
-
geopyspark.geotrellis.constants.
INT8UD
= 'int8ud'¶ Representes UByte Cells with user defined NoData values.
-
geopyspark.geotrellis.constants.
LANCZOS
= 'Lanczos'¶ A resampling method.
-
geopyspark.geotrellis.constants.
LESSTHAN
= 'LessThan'¶ A classification strategy.
-
geopyspark.geotrellis.constants.
LESSTHANOREQUALTO
= 'LessThanOrEqualTo'¶ A classification strategy.
-
geopyspark.geotrellis.constants.
LIGHT_TO_DARK_GREEN
= 'LightToDarkGreen'¶ A ColorRamp.
-
geopyspark.geotrellis.constants.
LIGHT_TO_DARK_SUNSET
= 'LightToDarkSunset'¶ A ColorRamp.
-
geopyspark.geotrellis.constants.
LIGHT_YELLOW_TO_ORANGE
= 'LightYellowToOrange'¶ A ColorRamp.
-
geopyspark.geotrellis.constants.
MAGMA
= 'magma'¶ A ColorRamp.
-
geopyspark.geotrellis.constants.
MAX
= 'Max'¶ A resampling method.
-
geopyspark.geotrellis.constants.
MEAN
= 'Mean'¶ Focal operation type
-
geopyspark.geotrellis.constants.
MEDIAN
= 'Median'¶ A resampling method.
-
geopyspark.geotrellis.constants.
MILLISECONDS
= 'millis'¶ A time unit used with ZORDER.
-
geopyspark.geotrellis.constants.
MINUTES
= 'minutes'¶ A time unit used with ZORDER.
-
geopyspark.geotrellis.constants.
MODE
= 'Mode'¶ A resampling method.
-
geopyspark.geotrellis.constants.
MONTHS
= 'months'¶ A time unit used with ZORDER.
-
geopyspark.geotrellis.constants.
NEARESTNEIGHBOR
= 'NearestNeighbor'¶ A resampling method.
-
geopyspark.geotrellis.constants.
NEIGHBORHOODS
= ['annulus', 'nesw', 'square', 'wedge', 'circle']¶ The NoData value for ints in GeoTrellis.
-
geopyspark.geotrellis.constants.
NESW
= 'nesw'¶ Neighborhood type.
-
geopyspark.geotrellis.constants.
NODATAINT
= -2147483648¶ A classification strategy.
-
geopyspark.geotrellis.constants.
PLASMA
= 'plasma'¶ A ColorRamp.
-
geopyspark.geotrellis.constants.
RESAMPLE_METHODS
= ['NearestNeighbor', 'Bilinear', 'CubicConvolution', 'Lanczos', 'Average', 'Mode', 'Median', 'Max', 'Min']¶ Layout scheme to match resolution of the closest level of TMS pyramid.
-
geopyspark.geotrellis.constants.
ROWMAJOR
= 'rowmajor'¶ A time unit used with ZORDER.
-
geopyspark.geotrellis.constants.
SECONDS
= 'seconds'¶ A time unit used with ZORDER.
-
geopyspark.geotrellis.constants.
SLOPE
= 'Slope'¶ Focal operation type.
-
geopyspark.geotrellis.constants.
SPACETIME
= 'spacetime'¶ Indicates the type value that needs to be serialized/deserialized. Both singleband and multiband GeoTiffs are referred to as this.
-
geopyspark.geotrellis.constants.
SPATIAL
= 'spatial'¶ Indicates that the RDD contains
(K, V)
pairs, where theK
has a spatial and time attribute. Both TemporalProjectedExtent and SpaceTimeKey are examples of this type ofK
.
-
geopyspark.geotrellis.constants.
SQUARE
= 'square'¶ Neighborhood type.
-
geopyspark.geotrellis.constants.
SUM
= 'Sum'¶ Focal operation type.
-
geopyspark.geotrellis.constants.
TILE
= 'Tile'¶ A resampling method.
-
geopyspark.geotrellis.constants.
UINT16
= 'uint16'¶ Representes Int Cells with constant NoData values.
-
geopyspark.geotrellis.constants.
UINT16RAW
= 'uint16raw'¶ Representes Int Cells.
-
geopyspark.geotrellis.constants.
UINT16UD
= 'uint16ud'¶ Representes Int Cells with user defined NoData values.
-
geopyspark.geotrellis.constants.
UINT8
= 'uint8'¶ Representes Short Cells with constant NoData values.
-
geopyspark.geotrellis.constants.
UINT8RAW
= 'uint8raw'¶ Representes Short Cells.
-
geopyspark.geotrellis.constants.
UINT8UD
= 'uint8ud'¶ Representes Short Cells with user defined NoData values.
-
geopyspark.geotrellis.constants.
VIRIDIS
= 'viridis'¶ A ColorRamp.
-
geopyspark.geotrellis.constants.
WEDGE
= 'wedge'¶ Neighborhood type.
-
geopyspark.geotrellis.constants.
YEARS
= 'years'¶ Neighborhood type.
-
geopyspark.geotrellis.constants.
ZOOM
= 'zoom'¶ Layout scheme to match resolution of source rasters.
-
geopyspark.geotrellis.constants.
ZORDER
= 'zorder'¶ A key indexing method. Works for RDDs that contain both SpatialKey and SpaceTimeKey. Note, indexes are determined by the
x
,y
, and ifSPACETIME
, the temporal resolutions of a point. This is expressed in bits, and has a max value of 62. Thus if the sum of those resolutions are greater than 62, then the indexing will fail.
geopyspark.geotrellis.geotiff_rdd module¶
This module contains functions that create RasterRDD
from files.
-
geopyspark.geotrellis.geotiff_rdd.
get
(geopysc, rdd_type, uri, options=None, **kwargs)¶ Creates a
RasterRDD
from GeoTiffs that are located on the local file system,HDFS
, orS3
.Parameters: - geopysc (geopyspark.GeoPyContext) – The
GeoPyContext
being used this session. - rdd_type (str) –
What the spatial type of the geotiffs are. This is represented by the constants:
SPATIAL
andSPACETIME
.Note
All of the GeoTiffs must have the same saptial type.
- uri (str) – The path to a given file/directory.
- options (dict, optional) –
A dictionary of different options that are used when creating the RDD. This defaults to
None
. IfNone
, then the RDD will be created using the default options for the given backend in GeoTrellis.Note
Key values in the
dict
should be in camel case, as this is the style that is used in Scala.- These are the options when using the local file system or
HDFS
: - crs (str, optional): The CRS that the output tiles should be
- in. The CRS must be in the well-known name format. If
None
, then the CRS that the tiles were originally in will be used.
- timeTag (str, optional): The name of the tiff tag that contains
- the time stamp for the tile. If
None
, then the default value is:TIFFTAG_DATETIME
.
- timeFormat (str, optional): The pattern of the time stamp for
- java.time.format.DateTimeFormatter to parse. If
None
, then the default value is:yyyy:MM:dd HH:mm:ss
.
- maxTileSize (int, optional): The max size of each tile in the
- resulting RDD. If the size is smaller than a read in tile,
then that tile will be broken into tiles of the specified
size. If
None
, then the whole tile will be read in.
- numPartitions (int, optional): The number of repartitions Spark
- will make when the data is repartitioned. If
None
, then the data will not be repartitioned.
- chunkSize (int, optional): How many bytes of the file should be
- read in at a time. If None, then files will be read in 65536 byte chunks.
S3
has the above options in addition to this:- s3Client (str, optional): Which
S3Cleint
to use when reading - GeoTiffs. There are currently two options:
default
andmock
. IfNone
,defualt
is used.- Note:
mock
should only be used in unit tests and debugging.
- s3Client (str, optional): Which
- These are the options when using the local file system or
- **kwargs – Option parameters can also be entered as keyword arguements.
Note
Defining both
options
andkwargs
will cause thekwargs
to be ignored in favor ofoptions
.Returns: RasterRDD
- geopysc (geopyspark.GeoPyContext) – The
geopyspark.geotrellis.neighborhoods module¶
Classes that represent the various neighborhoods used in focal functions.
Note
Once a parameter has been entered for any one of these classes it gets converted to a
float
if it was originally an int
.
-
class
geopyspark.geotrellis.neighborhoods.
Annulus
(inner_radius, outer_radius)¶ An Annulus neighborhood.
Parameters: - inner_radius (int or float) – The radius of the inner circle.
- outer_radius (int or float) – The radius of the outer circle.
-
inner_radius
¶ int or float – The radius of the inner circle.
-
outer_radius
¶ int or float – The radius of the outer circle.
-
param_1
¶ float – Same as
inner_radius
.
-
param_2
¶ float – Same as
outer_radius
.
-
param_3
¶ float – Unused param for
Annulus
. Is 0.0.
-
name
¶ str – The name of the neighborhood which is, “annulus”.
-
class
geopyspark.geotrellis.neighborhoods.
Circle
(radius)¶ A circle neighborhood.
Parameters: radius (int or float) – The radius of the circle that determines which cells fall within the bounding box. -
radius
¶ int or float – The radius of the circle that determines which cells fall within the bounding box.
-
param_1
¶ float – Same as
radius
.
-
param_2
¶ float – Unused param for
Circle
. Is 0.0.
-
param_3
¶ float – Unused param for
Circle
. Is 0.0.
-
name
¶ str – The name of the neighborhood which is, “circle”.
Note
Cells that lie exactly on the radius of the circle are apart of the neighborhood.
-
-
class
geopyspark.geotrellis.neighborhoods.
Nesw
(extent)¶ A neighborhood that includes a column and row intersection for the focus.
Parameters: extent (int or float) – The extent of this neighborhood. This represents the how many cells past the focus the bounding box goes. -
extent
¶ int or float – The extent of this neighborhood. This represents the how many cells past the focus the bounding box goes.
-
param_1
¶ float – Same as
extent
.
-
param_2
¶ float – Unused param for
Nesw
. Is 0.0.
-
param_3
¶ float – Unused param for
Nesw
. Is 0.0.
-
name
¶ str – The name of the neighborhood which is, “nesw”.
-
-
class
geopyspark.geotrellis.neighborhoods.
Wedge
(radius, start_angle, end_angle)¶ A wedge neighborhood.
Parameters: - radius (int or float) – The radius of the wedge.
- start_angle (int or float) – The starting angle of the wedge in degrees.
- end_angle (int or float) – The ending angle of the wedge in degrees.
-
radius
¶ int or float – The radius of the wedge.
-
start_angle
¶ int or float – The starting angle of the wedge in degrees.
-
end_angle
¶ int or float – The ending angle of the wedge in degrees.
-
param_1
¶ float – Same as
radius
.
-
param_2
¶ float – Same as
start_angle
.
-
param_3
¶ float – Same as
end_angle
.
-
name
¶ str – The name of the neighborhood which is, “wedge”.
geopyspark.geotrellis.rdd module¶
This module contains the RasterRDD
and the TiledRasterRDD
classes. Both of these classes are
wrappers of their Scala counterparts. These will be used in leau of actual PySpark RDDs
when performing operations.
-
class
geopyspark.geotrellis.rdd.
CachableRDD
¶ Base class for class that wraps a Scala RDD instance through a py4j reference.
-
geopysc
¶ GeoPyContext
– TheGeoPyContext
being used this session.
-
srdd
¶ py4j.java_gateway.JavaObject – The coresponding Scala RDD class.
-
cache
()¶ Persist this RDD with the default storage level (C{MEMORY_ONLY}).
-
persist
(storageLevel=StorageLevel(False, True, False, False, 1))¶ Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).
-
unpersist
()¶ Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
-
wrapped_rdds
()¶ Returns the list of RDD-containing objects wrapped by this object. The default implementation assumes that subclass contains a single RDD container, srdd, which implements the persist() and unpersist() methods.
-
-
class
geopyspark.geotrellis.rdd.
RasterRDD
(geopysc, rdd_type, srdd)¶ A wrapper of a RDD that contains GeoTrellis rasters.
Represents a RDD that contains
(K, V)
. WhereK
is either ProjectedExtent or TemporalProjectedExtent depending on therdd_type
of the RDD, andV
being a Raster.The data held within the RDD has not been tiled. Meaning the data has yet to be modified to fit a certain layout. See RasterRDD for more information.
Parameters: - geopysc (
GeoPyContext
) – TheGeoPyContext
being used this session. - rdd_type (str) – What the spatial type of the geotiffs are. This is
represented by the constants:
SPATIAL
andSPACETIME
. - srdd (py4j.java_gateway.JavaObject) – The coresponding Scala class. This is what allows
RasterRDD
to access the various Scala methods.
-
geopysc
¶ GeoPyContext
– TheGeoPyContext
being used this session.
-
rdd_type
¶ str – What the spatial type of the geotiffs are. This is represented by the constants:
SPATIAL
andSPACETIME
.
-
srdd
¶ py4j.java_gateway.JavaObject – The coresponding Scala class. This is what allows
RasterRDD
to access the various Scala methods.
-
cache
()¶ Persist this RDD with the default storage level (C{MEMORY_ONLY}).
-
collect_metadata
(extent=None, layout=None, crs=None, tile_size=256)¶ Iterate over RDD records and generates layer metadata desribing the contained rasters.
Parameters: - extent (
Extent
, optional) – Specify layout extent, must also specifylayout
. - layout (
TileLayout
, optional) – Specify tile layout, must also specifyextent
. - crs (str or int, optional) – Ignore CRS from records and use given one instead.
- tile_size (int, optional) – Pixel dimensions of each tile, if not using
layout
.
Note
extent
andlayout
must both be defined if they are to be used.Returns: Metadata
Raises: TypeError
– If eitherextent
andlayout
is not defined but the other is.- extent (
-
convert_data_type
(new_type)¶ Converts the underlying, raster values to a new
CellType
.Parameters: new_type (str) – The string representation of the CellType
to convert to. It is represented by a constant such asINT16
,FLOAT64UD
, etc.Returns: RasterRDD
Raises: ValueError
– When an unsupported cell type is entered.
-
cut_tiles
(layer_metadata, resample_method='NearestNeighbor')¶ Cut tiles to layout. May result in duplicate keys.
Parameters: - layer_metadata (
Metadata
) – TheMetadata
of theRasterRDD
instance. - resample_method (str, optional) – The resample method to use for the reprojection.
This is represented by the following constants:
NEARESTNEIGHBOR
,BILINEAR
,CUBICCONVOLUTION
,LANCZOS
,AVERAGE
,MODE
,MEDIAN
,MAX
, andMIN
. If none is specified, thenNEARESTNEIGHBOR
is used.
Returns: - layer_metadata (
-
classmethod
from_numpy_rdd
(geopysc, rdd_type, numpy_rdd)¶ Create a
RasterRDD
from a numpy RDD.Parameters: - geopysc (
GeoPyContext
) – TheGeoPyContext
being used this session. - rdd_type (str) – What the spatial type of the geotiffs are. This is
represented by the constants:
SPATIAL
andSPACETIME
. - numpy_rdd (pyspark.RDD) – A PySpark RDD that contains tuples of either ProjectedExtents or TemporalProjectedExtents and rasters that are represented by a numpy array.
Returns: - geopysc (
-
get_min_max
()¶ Returns the maximum and minimum values of all of the rasters in the RDD.
Returns: (float, float)
-
persist
(storageLevel=StorageLevel(False, True, False, False, 1))¶ Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).
-
reclassify
(value_map, data_type, boundary_strategy='LessThanOrEqualTo', replace_nodata_with=None)¶ Changes the cell values of a raster based on how the data is broken up.
Parameters: - value_map (dict) – A
dict
whose keys represent values where a break should occur and its values are the new value the cells within the break should become. - data_type (type) – The type of the values within the rasters. Can either be
int
orfloat
. - boundary_strategy (str, optional) – How the cells should be classified along the breaks.
This is represented by the following constants:
GREATERTHAN
,GREATERTHANOREQUALTO
,LESSTHAN
,LESSTHANOREQUALTO
, andEXACT
. If unspecified, thenLESSTHANOREQUALTO
will be used. - replace_nodata_with (data_type, optional) – When remapping values, nodata values must be treated separately. If nodata values are intended to be replaced during the reclassify, this variable should be set to the intended value. If unspecified, nodata values will be preserved.
Note
NoData symbolizes a different value depending on if
data_type
isint
orfloat
. Forint
, the constantNODATAINT
can be used which represents the NoData value forint
in GeoTrellis. Forfloat
,float('nan')
is used to represent NoData.Returns: RasterRDD
- value_map (dict) – A
-
reproject
(target_crs, resample_method='NearestNeighbor')¶ Reproject every individual raster to
target_crs
, does not sample past tile boundaryParameters: - target_crs (str or int) – The CRS to reproject to. Can either be the EPSG code, well-known name, or a PROJ.4 projection string.
- resample_method (str, optional) – The resample method to use for the reprojection.
This is represented by the following constants:
NEARESTNEIGHBOR
,BILINEAR
,CUBICCONVOLUTION
,LANCZOS
,AVERAGE
,MODE
,MEDIAN
,MAX
, andMIN
. If none is specified, thenNEARESTNEIGHBOR
is used.
Returns:
-
tile_to_layout
(layer_metadata, resample_method='NearestNeighbor')¶ Cut tiles to layout and merge overlapping tiles. This will produce unique keys.
Parameters: - layer_metadata (
Metadata
) – TheMetadata
of theRasterRDD
instance. - resample_method (str, optional) – The resample method to use for the reprojection.
This is represented by the following constants:
NEARESTNEIGHBOR
,BILINEAR
,CUBICCONVOLUTION
,LANCZOS
,AVERAGE
,MODE
,MEDIAN
,MAX
, andMIN
. If none is specified, thenNEARESTNEIGHBOR
is used.
Returns: - layer_metadata (
-
to_numpy_rdd
()¶ Converts a
RasterRDD
to a numpy RDD.Note
Depending on the size of the data stored within the RDD, this can be an exspensive operation and should be used with caution.
Returns: pyspark.RDD
-
to_tiled_layer
(extent=None, layout=None, crs=None, tile_size=256, resample_method='NearestNeighbor')¶ Converts this
RasterRDD
to aTiledRasterRDD
.This method combines
collect_metadata()
andtile_to_layout()
into one step.Parameters: - extent (
Extent
, optional) – Specify layout extent, must also specify layout. - layout (
TileLayout
, optional) – Specify tile layout, must also specifyextent
. - crs (str or int, optional) – Ignore CRS from records and use given one instead.
- tile_size (int, optional) – Pixel dimensions of each tile, if not using layout.
- resample_method (str, optional) – The resample method to use for the reprojection.
This is represented by the following constants:
NEARESTNEIGHBOR
,BILINEAR
,CUBICCONVOLUTION
,LANCZOS
,AVERAGE
,MODE
,MEDIAN
,MAX
, andMIN
. If none is specified, thenNEARESTNEIGHBOR
is used.
Note
extent
andlayout
must both be defined if they are to be used.Returns: TiledRasterRDD
- extent (
-
unpersist
()¶ Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
-
wrapped_rdds
()¶ Returns the list of RDD-containing objects wrapped by this object. The default implementation assumes that subclass contains a single RDD container, srdd, which implements the persist() and unpersist() methods.
- geopysc (
-
class
geopyspark.geotrellis.rdd.
TiledRasterRDD
(geopysc, rdd_type, srdd)¶ Wraps a RDD of tiled, GeoTrellis rasters.
Represents a RDD that contains
(K, V)
. WhereK
is either SpatialKey or SpaceTimeKey depending on therdd_type
of the RDD, andV
being a Raster.The data held within the RDD is tiled. This means that the rasters have been modified to fit a larger layout. For more information, see TiledRasterRDD.
Parameters: - geopysc (
GeoPyContext
) – TheGeoPyContext
being used this session. - rdd_type (str) – What the spatial type of the geotiffs are. This is represented by the
constants:
SPATIAL
andSPACETIME
. - srdd (py4j.java_gateway.JavaObject) – The coresponding Scala class. This is what allows
TiledRasterRDD
to access the various Scala methods.
-
geopysc
¶ GeoPyContext
– TheGeoPyContext
being used this session.
-
rdd_type
¶ str – What the spatial type of the geotiffs are. This is represented by the constants:
SPATIAL` and ``SPACETIME
.
-
srdd
¶ py4j.java_gateway.JavaObject – The coresponding Scala class. This is what allows
RasterRDD
to access the various Scala methods.
-
cache
()¶ Persist this RDD with the default storage level (C{MEMORY_ONLY}).
-
convert_data_type
(new_type)¶ Converts the underlying, raster values to a new
CellType
.Parameters: new_type (str) – The string representation of the CellType
to convert to. It is represented by a constant such asINT16
,FLOAT64UD
, etc.Returns: TiledRasterRDD
-
cost_distance
(geometries, max_distance)¶ Performs cost distance of a TileLayer.
Parameters: - geometries (list) –
A list of shapely geometries to be used as a starting point.
Note
All geometries must be in the same CRS as the TileLayer.
- max_distance (int, float) – The maximum cost that a path may reach before the operation.
stops. This value can be an
int
orfloat
.
Returns: - geometries (list) –
-
classmethod
euclidean_distance
(geopysc, geometry, source_crs, zoom, cellType='float64')¶ Calculates the Euclidean distance of a Shapely geometry.
Parameters: - geopysc (
GeoPyContext
) – TheGeoPyContext
being used this session. - geometry (shapely.geometry) – The input geometry to compute the Euclidean distance for.
- source_crs (str or int) – The CRS of the input geometry.
- zoom (int) – The zoom level of the output raster.
Note
This function may run very slowly for polygonal inputs if they cover many cells of the output raster.
Returns: RDD
- geopysc (
-
focal
(operation, neighborhood=None, param_1=None, param_2=None, param_3=None)¶ Performs the given focal operation on the layers contained in the RDD.
Parameters: - operation (str) – The focal operation. Represented by constants:
SUM
,MIN
,MAX
,MEAN
,MEDIAN
,MODE
,STANDARDDEVIATION
,ASPECT
, andSLOPE
. - neighborhood (str or
Neighborhood
, optional) – The type of neighborhood to use in the focal operation. This can be represented by either an instance ofNeighborhood
, or by the constants:ANNULUS
,NEWS
,SQUARE
,WEDGE
, andCIRCLE
. Defaults toNone
. - param_1 (int or float, optional) – If using
SLOPE
, then this is the zFactor, else it is the first argument ofneighborhood
. - param_2 (int or float, optional) – The second argument of the
neighborhood
. - param_3 (int or float, optional) – The third argument of the
neighborhood
.
Note
param
only need to be set ifneighborhood
is not an instance ofNeighborhood
or ifneighborhood
isNone
.Any
param
that is not set will default to 0.0.If
neighborhood
isNone
thenoperation
must be eitherSLOPE
orASPECT
.Returns: Raises: ValueError
– Ifoperation
is not a known operation.ValueError
– Ifneighborhood
is not a known neighborhood.ValueError
– Ifneighborhood
was not set, andoperation
is notSLOPE
orASPECT
.
- operation (str) – The focal operation. Represented by constants:
-
classmethod
from_numpy_rdd
(geopysc, rdd_type, numpy_rdd, metadata)¶ Create a
TiledRasterRDD
from a numpy RDD.Parameters: - geopysc (
GeoPyContext
) – TheGeoPyContext
being used this session. - rdd_type (str) – What the spatial type of the geotiffs are. This is represented by the
constants:
SPATIAL
andSPACETIME
. - numpy_rdd (pyspark.RDD) – A PySpark RDD that contains tuples of either SpatialKey or SpaceTimeKey and rasters that are represented by a numpy array.
- metadata (
Metadata
) – TheMetadata
of theTiledRasterRDD
instance.
Returns: - geopysc (
-
get_histogram
()¶ Returns an array of Java histogram objects, one for each band of the raster.
Parameters: None – Returns: An array of Java objects containing the histograms of each band
-
get_min_max
()¶ Returns the maximum and minimum values of all of the rasters in the RDD.
Returns: (float, float)
-
get_quantile_breaks
(num_breaks)¶ Returns quantile breaks for this RDD.
Parameters: num_breaks (int) – The number of breaks to return. Returns: [float]
-
get_quantile_breaks_exact_int
(num_breaks)¶ Returns quantile breaks for this RDD. This version uses the
FastMapHistogram
, which counts exact integer values. If your RDD has too many values, this can cause memory errors.Parameters: num_breaks (int) – The number of breaks to return. Returns: [int]
-
is_floating_point_layer
()¶ Determines whether the content of the TiledRasterRDD is of floating point type.
Parameters: None – Returns: [boolean]
-
layer_metadata
¶ Layer metadata associated with this layer.
-
lookup
(col, row)¶ Return the value(s) in the image of a particular
SpatialKey
(given by col and row).Parameters: - col (int) – The
SpatialKey
column. - row (int) – The
SpatialKey
row.
Returns: A list of numpy arrays (the tiles)
Raises: ValueError
– If using lookup on a nonSPATIAL
TiledRasterRDD
.IndexError
– If col and row are not within theTiledRasterRDD
’s bounds.
- col (int) – The
-
mask
(geometries)¶ Masks the
TiledRasterRDD
so that only values that intersect the geometries will be available.Parameters: geometries (list) – A list of shapely geometries to use as masks.
Note
All geometries must be in the same CRS as the TileLayer.
Returns: TiledRasterRDD
-
persist
(storageLevel=StorageLevel(False, True, False, False, 1))¶ Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).
-
polygonal_max
(geometry, data_type)¶ Finds the max value that is contained within the given geometry.
Parameters: - geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or str) – A
Shapely
Polygon
orMultiPolygon
that represents the area where the summary should be computed; or a WKT string representation of the geometry. - data_type (type) – The type of the values within the rasters. Can either be
int
orfloat
.
Returns: int
orfloat
depending ondata_type
.Raises: TypeError
– Ifdata_type
is not anint
orfloat
.- geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or str) – A
Shapely
-
polygonal_mean
(geometry)¶ Finds the mean of all of the values that are contained within the given geometry.
Parameters: geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or str) – A Shapely Polygon
orMultiPolygon
that represents the area where the summary should be computed; or a WKT string representation of the geometry.Returns: float
-
polygonal_min
(geometry, data_type)¶ Finds the min value that is contained within the given geometry.
Parameters: - geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or str) – A
Shapely
Polygon
orMultiPolygon
that represents the area where the summary should be computed; or a WKT string representation of the geometry. - data_type (type) – The type of the values within the rasters. Can either be
int
orfloat
.
Returns: int
orfloat
depending ondata_type
.Raises: TypeError
– Ifdata_type
is not anint
orfloat
.- geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or str) – A
Shapely
-
polygonal_sum
(geometry, data_type)¶ Finds the sum of all of the values that are contained within the given geometry.
Parameters: - geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or str) – A
Shapely
Polygon
orMultiPolygon
that represents the area where the summary should be computed; or a WKT string representation of the geometry. - data_type (type) – The type of the values within the rasters. Can either be
int
orfloat
.
Returns: int
orfloat
depending ondata_type
.Raises: TypeError
– Ifdata_type
is not anint
orfloat
.- geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or str) – A
Shapely
-
pyramid
(start_zoom, end_zoom, resample_method='NearestNeighbor')¶ Creates a pyramid of GeoTrellis layers where each layer reprsents a given zoom.
Parameters: - start_zoom (int) – The zoom level where pyramiding should begin. Represents the level that is most zoomed in.
- end_zoom (int) – The zoom level where pyramiding should end. Represents the level that is most zoomed out.
- resample_method (str, optional) – The resample method to use for the reprojection.
This is represented by the following constants:
NEARESTNEIGHBOR
,BILINEAR
,CUBICCONVOLUTION
,LANCZOS
,AVERAGE
,MODE
,MEDIAN
,MAX
, andMIN
. If none is specified, thenNEARESTNEIGHBOR
is used.
Returns: [TiledRasterRDDs]
.Raises: ValueError
– If the givenresample_method
is not known.ValueError
– If the col and row count is not a power of 2.
-
classmethod
rasterize
(geopysc, rdd_type, geometry, extent, crs, cols, rows, fill_value, instant=None)¶ Creates a
TiledRasterRDD
from a shapely geomety.Parameters: - geopysc (
GeoPyContext
) – TheGeoPyContext
being used this session. - rdd_type (str) – What the spatial type of the geotiffs are. This is
represented by the constants:
SPATIAL
andSPACETIME
. - geometry (str or shapely.geometry.Polygon) – The value to be turned into a raster. Can
either be a string or a
Polygon
. If the value is a string, it must be the WKT string, geometry format. - extent (
Extent
) – Theextent
of the new raster. - crs (str or int) – The CRS the new raster should be in.
- cols (int) – The number of cols the new raster should have.
- rows (int) – The number of rows the new raster should have.
- fill_value (int) –
The value to fill the raster with.
Note
Only the area the raster intersects with the
extent
will have this value. Any other area will be filled with GeoTrellis’ NoData value forint
which is represented in GeoPySpark as the constant,NODATAINT
. - instant (int, optional) – Optional if the data has no time component (ie is
SPATIAL
). Otherwise, it is requires and represents the time stamp of the data.
Returns: Raises: TypeError
– Ifgeometry
is not astr
or a Polygon; or if there was a mistach in inputs like setting therdd_type
asSPATIAL
but also settinginstant
.- geopysc (
-
reclassify
(value_map, data_type, boundary_strategy='LessThanOrEqualTo', replace_nodata_with=None)¶ Changes the cell values of a raster based on how the data is broken up.
Parameters: - value_map (dict) – A
dict
whose keys represent values where a break should occur and its values are the new value the cells within the break should become. - data_type (type) – The type of the values within the rasters. Can either be
int
orfloat
. - boundary_strategy (str, optional) – How the cells should be classified along the breaks.
This is represented by the following constants:
GREATERTHAN
,GREATERTHANOREQUALTO
,LESSTHAN
,LESSTHANOREQUALTO
, andEXACT
. If unspecified, thenLESSTHANOREQUALTO
will be used. - replace_nodata_with (data_type, optional) – When remapping values, nodata values must be treated separately. If nodata values are intended to be replaced during the reclassify, this variable should be set to the intended value. If unspecified, nodata values will be preserved.
Note
NoData symbolizes a different value depending on if
data_type
isint
orfloat
. Forint
, the constantNODATAINT
can be used which represents the NoData value forint
in GeoTrellis. Forfloat
,float('nan')
is used to represent NoData.Returns: TiledRasterRDD
- value_map (dict) – A
-
reproject
(target_crs, extent=None, layout=None, scheme='float', tile_size=256, resolution_threshold=0.1, resample_method='NearestNeighbor')¶ Reproject RDD as tiled raster layer, samples surrounding tiles.
Parameters: - target_crs (str or int) – The CRS to reproject to. Can either be the EPSG code, well-known name, or a PROJ.4 projection string.
- extent (
Extent
, optional) – Specify the layout extent, must also specifylayout
. - layout (
TileLayout
, optional) – Specify the tile layout, must also specifyextent
. - scheme (str, optional) – Which LayoutScheme should be used. Represented by the
constants:
FLOAT
andZOOM
. If not specified, thenFLOAT
is used. - tile_size (int, optional) – Pixel dimensions of each tile, if not using layout.
- resolution_threshold (double, optional) – The percent difference between a cell size and a zoom level along with the resolution difference between the zoom level and the next one that is tolerated to snap to the lower-resolution zoom.
- resample_method (str, optional) – The resample method to use for the reprojection.
This is represented by the following constants:
NEARESTNEIGHBOR
,BILINEAR
,CUBICCONVOLUTION
,LANCZOS
,AVERAGE
,MODE
,MEDIAN
,MAX
, andMIN
. If none is specified, thenNEARESTNEIGHBOR
is used.
Note
extent
andlayout
must both be defined if they are to be used.Returns: TiledRasterRDD
Raises: TypeError
– If eitherextent
orlayout
is defined but the other is not.
-
stitch
()¶ Stitch all of the rasters within the RDD into one raster.
Note
This can only be used on
SPATIAL
TiledRasterRDDs
.Returns: Raster
-
tile_to_layout
(layout, resample_method='NearestNeighbor')¶ Cut tiles to a given layout and merge overlapping tiles. This will produce unique keys.
Parameters: - layout (
TileLayout
) – Specify theTileLayout
to cut to. - resample_method (str, optional) – The resample method to use for the reprojection.
This is represented by the following constants:
NEARESTNEIGHBOR
,BILINEAR
,CUBICCONVOLUTION
,LANCZOS
,AVERAGE
,MODE
,MEDIAN
,MAX
, andMIN
. If none is specified, thenNEARESTNEIGHBOR
is used.
Returns: - layout (
-
to_numpy_rdd
()¶ Converts a
TiledRasterRDD
to a numpy RDD.Note
Depending on the size of the data stored within the RDD, this can be an exspensive operation and should be used with caution.
Returns: pyspark.RDD
-
unpersist
()¶ Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
-
wrapped_rdds
()¶ Returns the list of RDD-containing objects wrapped by this object. The default implementation assumes that subclass contains a single RDD container, srdd, which implements the persist() and unpersist() methods.
-
zoom_level
¶ The zoom level of the RDD. Can be
None
.
- geopysc (