What is GeoPySpark?¶
GeoPySpark is a Python language binding library of the Scala library, GeoTrellis. Like GeoTrellis, this project is released under the Apache 2 License.
GeoPySpark seeks to utilize GeoTrellis to allow for the reading, writing, and operating on raster data. Thus, its able to scale to the data and still be able to perform well.
In addition to raster processing, GeoPySpark allows for rasters to be rendered into PNGs. One of the goals of this project to be able to process rasters at web speeds and to perform batch processing of large data sets.
Why GeoPySpark?¶
Raster processing in Python has come a long way; however, issues still arise as the size of the dataset increases. Whether it is performance or ease of use, these sorts of problems will become more common as larger amounts of data are made available to the public.
One could turn to GeoTrellis to resolve the aforementioned problems (and one should try it out!), yet this brings about new challenges. Scala, while a powerful language, has something of a steep learning curve. This can put off those who do not have the time and/or interest in learning a new language.
By having the speed and scalability of Scala and the ease of Python, GeoPySpark is then the remedy to this predicament.
A Quick Example¶
Here is a quick example of GeoPySpark. In the following code, we take NLCD data of the state of Pennsylvania from 2011, and do a masking operation on it with a Polygon that represents an area of interest. This masked layer is then saved.
If you wish to follow along with this example, you will need to download the NLCD data and unzip it.. Running these two commands will complete these tasks for you:
curl -o /tmp/NLCD2011_LC_Pennsylvannia.zip https://s3-us-west-2.amazonaws.com/prd-tnm/StagedProducts/NLCD/2011/landcover/states/NLCD2011_LC_Pennsylvania.zip?ORIG=513_SBDDG
unzip -d /tmp /tmp/NLCD2011_LC_Pennsylvannia.zip
import geopyspark as gps
from pyspark import SparkContext
from shapely.geometry import box
# Create the SparkContext
conf = gps.create_geopyspark_conf(appName="geopyspark-example", master="local[*]")
sc = SparkContext(conf=conf)
# Read in the NLCD tif that has been saved locally.
# This tif represents the state of Pennsylvania.
raster_layer = gps.geotiff.get(layer_type=gps.LayerType.SPATIAL,
uri='/tmp/NLCD2011_LC_Pennsylvania.tif',
num_partitions=100)
# Tile the rasters within the layer and reproject them to Web Mercator.
tiled_layer = raster_layer.tile_to_layout(layout=gps.GlobalLayout(), target_crs=3857)
# Creates a Polygon that covers roughly the north-west section of Philadelphia.
# This is the region that will be masked.
area_of_interest = box(-75.229225, 40.003686, -75.107345, 40.084375)
# Mask the tiles within the layer with the area of interest
masked = tiled_layer.mask(geometries=area_of_interest)
# We will now pyramid the masked TiledRasterLayer so that we can use it in a TMS server later.
pyramided_mask = masked.pyramid()
# Save each layer of the pyramid locally so that it can be accessed at a later time.
for pyramid in pyramided_mask.levels.values():
gps.write(uri='file:///tmp/pa-nlcd-2011',
layer_name='north-west-philly',
tiled_raster_layer=pyramid)
Contact and Support¶
If you need help, have questions, or would like to talk to the developers (let us know what you’re working on!) you can contact us at:
As you may have noticed from the above links, those are links to the GeoTrellis Gitter channel and mailing list. This is because this project is currently an offshoot of GeoTrellis, and we will be using their mailing list and gitter channel as a means of contact. However, we will form our own if there is a need for it.
Changelog¶
0.1.0¶
The first release of GeoPySpark! After being in development for the past 6 months, it is now ready for its initial release! Since nothing has been changed or updated per se, we’ll just go over the features that will be present in 0.1.0.
geopyspark.geotrellis
- Create a
RasterRDD
from GeoTiffs that are stored locally, on S3, or on HDFS.- Serialize Python RDDs to Scala and back.
- Perform various tiling operations such as
tile_to_layout
,cut_tiles
, andpyramid
.- Stitch together a
TiledRasterRDD
to create oneRaster
.rasterize
geometries and turn them intoRasterRDD
.reclassify
values of Rasters in RDDs.- Calculate
cost_distance
on aTiledRasterRDD
.- Perform local and focal operations on
TiledRasterRDD
.- Read, write, and query GeoTrellis tile layers.
- Read tiles from a layer.
- Added
PngRDD
to make rendering to PNGs more efficient.- Added
RDDWrapper
to provide more functionality to the RDD classes.- Polygonal summary methods are now available to
TiledRasterRDD
.- Euclidean distance added to
TiledRasterRDD
.- Neighborhoods submodule added to make focal operations easier.
geopyspark.command
- GeoPySpark can now use a script to download the jar. Used when installing GeoPySpark from pip.
Documentation
- Added docstrings to all python classes, methods, etc.
- Core-Concepts, rdd, geopycontext, and catalog.
- Ingesting and creating a tile server with a greyscale raster dataset.
- Ingesting and creating a tile server with data from Sentinel.
0.2.0¶
The second release of GeoPySpark has brought about massive changes to the library. Many more features have been added, and some have been taken away. The API has also been overhauld, and code written using the 0.1.0 code will not work with this version.
Because so much has changed over these past few months, only the most major changes will be discussed below.
geopyspark
- Removed
GeoPyContext
.- Added
geopyspark_conf
function which is used to create aSparkConf
for GeoPySpark.- Changed how the environemnt is constructed when using GeoPySpark.
geopyspark.geotrellis
- A
SparkContext
instance is no longer needs to be passed in for any class or function.- Renamed
RasterRDD
andTiledRasterRDD
toRasterLayer
andTiledRasterLayer
.- Changed how
tile_to_layout
andreproject
work.- Broked out
rasterize
,hillshade
,cost_distance
, andeuclidean_distance
into their own, respective modules.- Added the
Pyramid
class tolayer.py
.- Renamed
geotiff_rdd
togeotiff
.- Broke out the options in
geotiff.get
.- Constants are now orginized by enum classes.
- Avro is no longer used for serialization/deserialization.
- ProtoBuf is now used for serialization/deserialization.
- Added the
render
module.- Added the
color
mdoule.- Added the
histogram
moudle.
Documentation
- Updated all of the docstrings to reflect the new changes.
- All of the documentation has been updated to reflect the new chnagtes.
- Example jupyter notebooks have been added.
Contributing¶
We value all kinds of contributions from the community, not just actual code. Perhaps the easiest and yet one of the most valuable ways of helping us improve GeoPySpark is to ask questions, voice concerns or propose improvements on the GeoTrellis Mailing List. As of now, we will be using this to interact with our users. However, this could change depending on the volume/interest of users.
If you do like to contribute actual code in the form of bug fixes, new features or other patches this page gives you more info on how to do it.
Building GeoPySpark¶
- Install and setup Hadoop (the master branch is currently built with 2.0.1).
- Check out this. repository.
- Pick the branch corresponding to the version you are targeting
- Run
make install
to build GeoPySpark.
Style Guide¶
We try to follow the PEP 8 Style Guide for Python Code as closely as possible, although you will see some variations throughout the codebase. When in doubt, follow that guide.
Git Branching Model¶
The GeoPySpark team follows the standard practice of using the
master
branch as main integration branch.
Git Commit Messages¶
We follow the ‘imperative present tense’ style for commit messages. (e.g. “Add new EnterpriseWidgetLoader instance”)
Issue Tracking¶
If you find a bug and would like to report it please go there and create an issue. As always, if you need some help join us on Gitter to chat with a developer. As with the mailing list, we will be using the GeoTrellis Gitter channel until the need arises to form our own.
Pull Requests¶
If you’d like to submit a code contribution please fork GeoPySpark and
send us pull request against the master
branch. Like any other open
source project, we might ask you to go through some iterations of
discussion and refinement before merging.
As part of the Eclipse IP Due Diligence process, you’ll need to do some
extra work to contribute. This is part of the requirement for Eclipse
Foundation projects (see this page in the Eclipse
wiki
You’ll need to sign up for an Eclipse account with the same email you
commit to github with. See the Eclipse Contributor Agreement
text
below. Also, you’ll need to signoff on your commits, using the
git commit -s
flag. See
https://help.github.com/articles/signing-tags-using-gpg/ for more info.
Eclipse Contributor Agreement (ECA)¶
Contributions to the project, no matter what kind, are always very welcome. Everyone who contributes code to GeoTrellis will be asked to sign the Eclipse Contributor Agreement. You can electronically sign the Eclipse Contributor Agreement here.
Editing these Docs¶
Contributions to these docs are welcome as well. To build them on your own
machine, ensure that sphinx
and make
are installed.
Installing Dependencies¶
Ubuntu 16.04¶
> sudo apt-get install python-sphinx python-sphinx-rtd-theme
Arch Linux¶
> sudo pacman -S python-sphinx python-sphinx_rtd_theme
MacOS¶
brew
doesn’t supply the sphinx binaries, so use pip
here.
Pip¶
> pip install sphinx sphinx_rtd_theme
Building the Docs¶
Assuming you’ve cloned the GeoTrellis repo, you can now build the docs yourself. Steps:
- Navigate to the
docs/
directory - Run
make html
- View the docs in your browser by opening
_build/html/index.html
Note
Changes you make will not be automatically applied; you will have to rebuild the docs yourself. Luckily the docs build in about a second.
File Structure¶
There is currently not a file structure in place for docs. Though, this will change soon.
Core Concepts¶
Because GeoPySpark is a binding of an existing project, GeoTrellis, some terminology and data representations have carried over. This section seeks to explain this jargon in addition to describing how GeoTrellis types are represented in GeoPySpark.
Rasters¶
GeoPySpark differs in how it represents rasters from other geo-spatial
Python libraries like rasterIO. In GeoPySpark, they are represented by
the Tile
class. This class contains a numpy array (refered to as
cells
) that represents the cells of the raster in addition to other
information regarding the data. Along with cells
, Tile
can also
have the no_data_value
of the raster.
Note: All rasters in GeoPySpark are represented as having multiple bands, even if the original raster just contained one.
Extent¶
Describes the area on Earth a raster represents. This area is
represented by coordinates that are in some Coordinate Reference System.
Thus, depending on the system in use, the values that outline the
extent
can vary. Extent
can also be refered to as a bounding
box.
Note: The values within the Extent
must be float
s and not
double
s.
ProjectedExtent¶
ProjectedExtent
describes both the area on Earth a raster represents
in addition to its CRS. Either the EPSG code or a proj4 string can be
used to indicate the CRS of the ProjectedExtent
.
TemporalProjectedExtent¶
Similar to ProjectedExtent
, TemporalProjectedExtent
describes
the area on Earth the raster represents, its CRS, and the time the data
was represents. This point of time, called instant
, is an instance
of datetime.datetime
.
TileLayout¶
TileLayout
describes the grid which represents how rasters are
orginized and assorted in a layer. layoutCols
and layoutRows
detail how many columns and rows the grid itself has, respectively.
While tileCols
and tileRows
tell how many columns and rows each
individual raster has.
LayoutDefinition¶
LayoutDefinition
describes both how the rasters are orginized in a
layer as well as the area covered by the grid.
Tiling Strategies¶
It is often the case that the exact layout of the layer is unknown. Rather than having to go through the effort of trying to figure out the optimal layout, there exists two different tiling strategies that will produce a layout based on the data they are given.
LocalLayout¶
LocalLayout
is the first tiling strategy that produces a layout
where the grid is constructed over all of the pixels within a layer of a
given tile size. The resulting layout will match the original resolution
of the cells within the rasters.
Note: This layout cannot be used for creating display layers. Rather, it is best used for layers where operations and analysis will be performed.
GlobalLayout¶
The other tiling strategy is GlobalLayout
which makes a layout where
the grid is constructed over the global extent CRS. The cell resolution
of the resulting layer be multiplied by a power of 2 for the CRS. Thus,
using this strategy will result in either up or down sampling of the
original raster.
Note: This layout strategy should be used when the resulting layer is to be dispalyed in a TMS server.
You may have noticed from the above two examples that GlobalLayout
does not create layout for a given zoom level by default. Rather, it
determines what the zoom should be based on the size of the cells within
the rasters. If you do want to create a layout for a specific zoom
level, then the zoom
parameter must be set.
SpatialKey¶
SpatialKey
s describe the positions of rasters within the grid of
the layout. This grid is a 2D plane where the location of a raster is
represented by a pair of coordinates, col
and row
, respectively.
As its name and attributes suggest, SpatialKey
deals solely with
spatial data.
SpaceTimeKey¶
Like SpatialKey
s, SpaceTimeKey
s describe the position of a
raster in a layout. However, the grid is a 3D plane where a location of
a raster is represented by a pair of coordinates, col
and row
,
as well as a z value that represents a point in time called,
instant
. Like the instant
in TemporalProjectedExtent
, this
is also an instance of datetime.datetime
. Thus, SpaceTimeKey
s
deal with spatial-temporal data.
Bounds¶
Bounds
represents the the extent of the layout grid in terms of
keys. It has both a minKey
and a maxKey
attributes. These can
either be a SpatialKey
or a SpaceTimeKey
depending on the type
of data within the layer. The minKey
is the left, uppermost cell in
the grid and the maxKey
is the right, bottommost cell.
Metadata¶
Metadata
contains information of the values within a layer. This
data pertains to the layout, projection, and extent of the data
contained within the layer.
The below example shows how to construct Metadata
by hand, however,
this is almost never required and Metadata
can be produced using
easier means. For RasterLayer
, one call the method,
collect_metadata()
and TiledRasterLayer
has the attribute,
layer_metadata
.
Working With Layers¶
How is Data Stored and Represented in GeoPySpark?¶
All data that is worked with in GeoPySpark is at some point stored
within an RDD
. Therefore, it is important to understand how
GeoPySpark stores, represents, and uses these RDD
s throughout the
library.
GeoPySpark does not work with PySpark RDD
s, but rather, uses
Python classes that are wrappers for Scala classes that contain and work
with a Scala RDD
. Specifically, these wrapper classes are
RasterLayer
and TiledRasterLayer
, which will be discussed in
more detail later.
Layers Are More Than RDDs¶
We refer to the Python wrapper classes as layers and not RDD
s for
two reasons: first, neither RasterLayer
or TiledRasterLayer
actually extends PySpark’s RDD
class; but more importantly, these
classes contain more information than just the RDD
. When we refer to
a “layer”, we mean both the RDD
and its attributes.
The RDD
s contained by GeoPySpark layers contain tuples which have
type (K, V)
, where K
represents the key, and V
represents
the value. V
will always be a Tile
, but K
differs depending
on both the wrapper class and the nature of the data itself. More on
this below.
RasterLayer¶
The RasterLayer
class deals with untiled data—that is, the
elements of the layer have not been normalized into a single unified
layout. Each raster element may have distinct resolutions or sizes; the
extents of the constituent rasters need not follow any orderly pattern.
Essentially, a RasterLayer
stores “raw” data, and its main purpose
is to act as a way station on the path to acquiring tiled data that
adheres to a specified layout.
The RDD
s contained by RasterLayer
objects have key type,
K
, of either ProjectedExtent
or TemporalProjectedExtent
,
when the layer type is SPATIAL
or SPACETIME
, respectively.
TiledRasterLayer¶
TiledRasterLayer
is the complement to RasterLayer
and is meant
to store tiled data. Tiled data has been fitted to a certain layout,
meaning that it has been regularly sampled, and it has been cut up into
uniformly-sized, non-overlapping pieces that can be indexed sensibly.
The benefit of having data in this state is that now it will be easy to
work with. It is with this class that the user will be able to, for
example, perform map algebra, create pyramids, and save the layer. See
below for the definitions and specific examples of these operations.
In the case of TiledRasterLayer
, K
is either SpatialKey
or
SpaceTimeKey
.
RasterLayer¶
Creating RasterLayers¶
There are just two ways to create a RasterLayer
: (1) through reading
GeoTiffs from the local file system, S3, or HDFS; or (2) from an
existing PySpark RDD.
From PySpark RDDs¶
The first option is to create a RasterLayer
from a PySpark RDD
via the from_numpy_rdd
class method. This step can be a bit more
involved, as it requires the data within the PySpark RDD to be formatted
in a specific way (see How is Data Stored and Represented in
GeoPySpark for
more information).
The following example constructs an RDD
from a tuple. The first
element is a ProjectedExtent
because we have decided to make the
data spatial. If we were dealing with spatial-temproal data, then
TemporalProjectedExtent
would be the first element. A Tile
will
always be the second element of the tuple.
From GeoTiffs¶
The get
function in the geopyspark.geotrellis.geotiff
module
creates an instance of RasterLayer
from GeoTiffs. These files can be
located on either your local file system, HDFS, or S3. In this example,
a GeoTiff with spatial data is read locally.
Using RasterLayer¶
This next section goes over the methods of RasterLayer
. It should be
noted that not all methods contained within this class will be covered.
More information on the methods that deal with the visualization of the
contents of the layer can be found in the [visualization guide].
Converting to a Python RDD¶
By using to_numpy_rdd
, the base RasterLayer
will be serialized
into a Python RDD
. This will convert all of the first values within
each tuple to either ProjectedExtent
or TemporalProjectedExtent
,
and the second value to Tile
.
SpaceTime Layer to Spatial Layer¶
If you’re working with a spatial-temporal layer and would like to
convert it to a spatial layer, then you can use the to_spatial_layer
method. This changes the keys of the RDD
within the layer by
converting TemporalProjectedExtent
to ProjectedExtent
.
Collecting Metadata¶
The Metadata
of a layer contains information of the values within
it. This data pertains to the layout, projection, and extent of the data
found within the layer.
collect_metadata
will return the Metadata
of the layer that fits
the layout
given.
Reproject¶
reproject
will change the projection the rasters within the layer to
the given target_crs
. This method does not sample past the tiles’
boundaries.
Tiling Data to a Layout¶
tile_to_layout
will tile and format the rasters within a
RasterLayer
to a given layout. The result of this tiling is a new
instance of TiledRasterLayer
. This output contains the same data as
its source RasterLayer
, however, the information contained within it
will now be orginized according to the given layout.
During this step it is also possible to reproject the RasterLayer
.
This can be done by specifying the target_crs
to reproject to.
Reprojecting using this method produces a different result than what is
returned by the reproject
method. Whereas the latter does not sample
past the boundaries of rasters within the layer, the former does. This
is important as anything with a GlobalLayout
needs to sample past
the boundaries of the rasters.
From Metadata¶
Create a TiledRasterLayer
that contains the layout from the given
Metadata
.
Note: If the specified target_crs
is different from what’s in
the metadata, then an error will be thrown.
From LayoutDefinition¶
From LocalLayout¶
From GlobalLayout¶
From A TiledRasterLayer¶
One can tile a RasterLayer
to the same layout as a
TiledRasterLayout
.
Note: If the specifying target_crs
is different from the other
layer’s, then an error will be thrown.
TiledRasterLayer¶
Creating TiledRasterLayers¶
For this guide, we will just go over one initialization method for
TiledRasterLayer
, from_numpy_rdd
. However, there are other ways
to create this class. These additional creation strategies can be found
in the [map algebra guide].
From PySpark RDD¶
Like RasterLayer
s, TiledRasterLayer
s can be created from
RDD
s using from_numpy_rdd
. What is different, however, is that
Metadata
must also be passed in during initialization. This makes
creating TiledRasterLayer
s this way a little bit more arduous.
The following example constructs an RDD
from a tuple. The first
element is a SpatialKey
because we have decided to make the data
spatial. If we were dealing with spatial-temproal data, then
SpaceTimeKey
would be the first element. Tile
will always be the
second element of the tuple.
Using TiledRasterLayers¶
This section will go over the methods found within TiledRasterLayer
.
Like with RasterLayer
, not all methods within this class will be
covered in this guide. More information on the methods that deal with
the visualization of the contents of the layer can be found in the
[visualization guide]; and those that deal with map algebra can be found
in the [map algebra guide].
Converting to a Python RDD¶
By using to_numpy_rdd
, the base TiledRasterLayer
will be
serialized into a Python RDD
. This will convert all of the first
values within each tuple to either SpatialKey
or SpaceTimeKey
,
and the second value to Tile
.
SpaceTime Layer to Spatial Layer¶
If you’re working with a spatiotemporal layer and would like to convert
it to a spatial layer, then you can use the to_spatial_layer
method.
This changes the keys of the RDD
within the layer by converting
SpaceTimeKey
to SpatialKey
.
Repartitioning¶
While not an RDD
, TiledRasterLayer
does contain an underlying
RDD
, and thus, it can be repartitioned using the repartition
method.
Lookup¶
If there is a particular tile within the layer that is of interest, it
is possible to retrieve it as a Tile
using the lookup
method.
Masking¶
By using mask
method, the TiledRasterRDD
can be masekd using one
or more Shapely geometries.
Normalize¶
normalize
will linearly transform the data within the layer such
that all values fall within a given range.
Pyramiding¶
When using a layer for a TMS server, it is important that the layer is
pyramided. That is, we create a level-of-detail hierarchy that covers
the same geographical extent, while each level of the pyramid uses one
quarter as many pixels as the next level. This allows us to zoom in and
out when the layer is being displayed without using extraneous detail.
The pyramid
method will produce an instance of Pyramid
that will
contain within it multiple TiledRasterLayer
s. Each layer
corresponds to a zoom level, and the number of levels depends on the
zoom_level
of the source layer. With the max zoom of the Pyramid
being the source layer’s zoom_level
, and the lowest zoom being 0.
For more information on the Pyramiding
class, see the [visualization
guide].
Reproject¶
This is similar to the reproject
method for RasterLayer
where
the reprojection will not sample past the tiles’ boundaries. This means
the layout of the tiles will be changed so that they will take on a
LocalLayout
rather than a GlobalLayout
(read more about these
layouts here). Because of
this, whatever zoom_level
the TiledRasterLayer
has will be
changed to 0 since the area being represented changes to just the tiles.
Stitching¶
Using stitch
will produce a single Tile
by stitching together
all of the tiles within the TiledRasterLayer
. This can only be done
with spatial layers, and is not recommended if the data contained within
the layer is large, as it can cause a crash due to the size of the
resulting Tile
.
Saving a Stitched Layer¶
The save_stitched
method both stitches and saves a layer as a
GeoTiff.
It is also possible to specify the regions of layer to be saved when it is stitched.
Tiling Data to a Layout¶
This is similar to RasterLayer
‘s tile_to_layout
method, except
for one important detail. If performing a tile_to_layout
on a
TiledRasterLayer
that contains a zoom_level
, that zoom_level
could be lost or changed depending on the layout
and/or
target_crs
chosen. Thus, it is important to keep that in mind in
retiling a TiledRasterLayer
.
General Methods¶
There exist methods that are found in both RasterLayer
and
TiledRasterLayer
. These methods tend to perform more general
analysis/tasks, thus making them suitable for both classes. This next
section will go over these methods.
Note: In the following examples, both RasterLayer
s and
TiledRasterLayer
s will be used. However, they can easily be
subsituted with the other class.
Selecting a SubSection of Bands¶
To select certain bands to work with, the bands
method will take
either a single or collection of band indices and will return the subset
as a new RasterLayer
or TiledRasterLayer
.
Note: There could high performance costs if operations are performed between two sub-bands of a large dataset. Thus, if you’re working with a large amount of data, then it is recommended to do band selection before reading them in.
Converting the Data Type of the Rasters’ Cells¶
The convert_data_type
method will convert the types of the cells
within the rasters of the layer to a new data type. The noData
value
can also be set during this conversion, and if it’s not set, then there
will be no noData
value for the resulting rasters.
Reclassify Cell Values¶
reclassify
changes the cell values based on the value_map
and
classification_strategy
given. In addition to these two parameters,
the data_type
of the cells also needs to be given. This is either
int
or float
.
Mapping Over the Cells¶
It is possible to work with the cells within a layer directly via the
map_cells
method. This method takes a function that expects a numpy
array and a noData value as parameters, and returns a new numpy array.
Thus, the function given would have the following type signature:
def input_function(numpy_array: np.ndarray, no_data_value=None) -> np.ndarray
The given function is then applied to each Tile
in the layer.
Note: In order for this method to operate, the internal RDD
first needs to be deserialized from Scala to Python and then serialized
from Python back to Scala. Because of this, it is recommended to chain
together all functions to avoid unnecessary serialization overhead.
Mapping Over Tiles¶
Like map_cells
, map_tiles
maps a given function over all of the
Tile
s within the layer. It takes a function that expects a
Tile
and returns a Tile
. Therefore, the input function’s type
signature would be this:
def input_function(tile: Tile) -> Tile
Note: In order for this method to operate, the internal RDD
first needs to be deserialized from Scala to Python and then serialized
from Python back to Scala. Because of this, it is recommended to chain
together all functions to avoid unnecessary serialization overhead.
Calculating the Histogram for the Layer¶
It is possible to calculate the histogram of a layer either by using the
get_histogram
or the get_class_histogram
method. Both of these
methods produce a Histogram
, however, the way the data is
represented within the resulting histogram differs depending on the
method used. get_histogram
will produce a histogram whose values are
float
s. Whereas get_class_histogram
returns a histogram whose
values are int
s.
For more informaiton on the Histogram
class, please see the
Histogram
[guide].
Finding the Quantile Breaks for the Layer¶
If you wish to find the quantile breaks for a layer without a
Histogram
, then you can use the get_quantile_breaks
method.
Quantile Breaks for Exact Ints¶
There is another version of get_quantile_breaks
called
get_quantile_breaks_exact_int
that will count exact integer values.
However, if there are too many values within the layer, then memory
errors could occur.
Finding the Min and Max Values of a Layer¶
The get_min_max
method will find the min and max value for the
layer. The result will always be (float, float)
regardless of the
data type of the cells.
RDD Methods¶
As mentioned in the section on TiledRasterLayer
‘s repartition
method, TiledRasterLayer
has methods to work
with its internal RDD
. This holds true for RasterLayer
as well.
The following is a list of RDD
with examples that are supported by
both classes.
Cache¶
Persist¶
Unpersist¶
getNumberOfPartitions¶
Count¶
Catalog¶
The catalog
module allows for users to retrieve information, query,
and write to/from GeoTrellis layers.
What is a Catalog?¶
A catalog is a directory where saved layers and their attributes are
organized and stored in a certain manner. Within a catalog, there can
exist multiple layers from different data sets. Each of these layers, in
turn, are their own directories which contain two folders: one where the
data is stored and the other for the metadata. The data for each layer
is broken up into zoom levels and each level has its own folder within
the data folder of the layer. As for the metadata, it is also broken up
by zoom level and is stored as json
files within the metadata
folder.
Here’s an example directory structure of a catalog:
layer_catalog/
layer_a/
metadata_for_layer_a/
metadata_layer_a_zoom_0.json
....
data_for_layer_a/
0/
data
...
1/
data
...
...
layer_b/
...
Accessing Data¶
GeoPySpark supports a number of different backends to save and read information from. These are the currently supported backends:
- LocalFileSystem
- HDFS
- S3
- Cassandra
- HBase
- Accumulo
Each of these needs to be accessed via the URI
for the given system.
Here are example URI
s for each:
- Local Filesystem: file://my_folder/my_catalog/
- HDFS: hdfs://my_folder/my_catalog/
- S3: s3://my_bucket/my_catalog/
- Cassandra: cassandra://[user:password@]zookeeper[:port][/keyspace][?attributes=table1[&layers=table2]]
- HBase: hbase://zookeeper[:port][?master=host][?attributes=table1[&layers=table2]]
- Accumulo: accumulo://[user[:password]@]zookeeper/instance-name[?attributes=table1[&layers=table2]]
It is important to note that neither HBase nor Accumulo have native
support for URI
s. Thus, GeoPySpark uses its own pattern for these
two systems.
A Note on Formatting Tiles¶
A small, but important, note needs to be made about how tiles that are
saved and/or read in are formatted in GeoPySpark. All tiles will be
treated as a MultibandTile
. Regardless if they were one to begin
with. This was a design choice that was made to simplify both the
backend and the API of GeoPySpark.
Saving Data to a Backend¶
The write
function will save a given TiledRasterLayer
to a
specified backend. If the catalog does not exist when calling this
function, then it will be created along with the saved layer.
Note: It is not possible to save a layer to a catalog if the layer name and zoom already exist. If you wish to overwrite an existing, saved layer then it must be deleted before writing the new one.
Note: Saving a TiledRasterLayer
that does not have a
zoom_level
will save the layer to a zoom of 0. Thus, when it is read
back out from the catalog, the resulting TiledRasterLayer
will have
a zoom_level
of 0.
Saving a Spatial Layer¶
Saving a spatial layer is a straight forward task. All that needs to be
supplied is a URI
, the name of the layer, and the layer to be saved.
Saving a Spatial Temporal Layer¶
When saving a spatial-temporal layer, one needs to consider how the
records within the catalog will be spaced; which in turn, determines the
resolution of index. The TimeUnit
enum class contains all available
units of time that can be used to space apart data in the catalog.
Saving a Pyramid¶
For those that are unfamiliar with the Pyramid
class, please see the
[Pyramid section] of the visualization guide. Otherwise, please continue
on.
As of right now, there is no way to directly save a Pyramid
.
However, because a Pyramid
is just a collection of
TiledRasterLayer
s of different zooms, it is possible to iterate
through the layers of the Pyramid
and save one individually.
Reading Metadata From a Saved Layer¶
It is possible to retrieve the Metadata
for a layer without reading
in the whole layer. This is done using the read_layer_metadata
function. There is no difference between spatial and spatial-temporal
layers when using this function.
Reading a Tile From a Saved Layer¶
One can read a single tile that has been saved to a layer using the
read_value
function. This will either return a Tile
or None
depending on whether or not the specified tile exists.
Reading a Tile From a Saved, Spatial Layer¶
Reading a Tile From a Saved, Spatial-Temporal Layer¶
Reading a Layer¶
There are two ways one can read a layer in GeoPySpark: reading the
entire layer or just portions of it. The former will be the goal
discussed in this section. While all of the layer will be read, the
function for doing so is called, query
. There is no difference
between spatial and spatial-temporal layers when using this function.
Note: What distinguishes between a full and partial read is the
parameters given to query
. If no filters were given, then the whole
layer is read.
Querying a Layer¶
When only a certain section of the layer is of interest, one can
retrieve these areas of the layer through the query
method.
Depending on the type of data being queried, there are a couple of ways
to filter what will be returned.
Querying a Spatial Layer¶
One can query an area of a spatial layer that covers the region of
interest by providing a geometry that represents this region. This area
can be represented as: shapely.geometry
(specifically Polygon
s
and MultiPolygon
s), the wkb
representation of the geometry, or
an Extent
.
Note: It is important that the given geometry is in the same projection as the queried layer. Otherwise, either the wrong area or nothing will be returned.
When the Queried Geometry is in the Same Projection as the Layer¶
By default, the query
function assumes that the geometry and layer
given are in the same projection.
When the Queried Geometry is in a Different Projection than the Layer¶
As stated above, it is important that both the geometry and layer are in
the same projection. If the two are in different CRSs, then this can be
resolved by setting the proj_query
parameter to whatever projection
the geometry is in.
Map Algebra¶
Given a set of raster layers, it may be desirable to combine and filter the content of those layers. This is the function of map algebra. Two classes of map algebra operations are provided by GeoPySpark: local and focal operations. Local operations individually consider the pixels or cells of one or more rasters, applying a function to the corresponding cell values. For example, adding two rasters’ pixel values to form a new layer is a local operation.
Focal operations consider a region around each pixel of an input raster and apply an operation to each region. The result of that operation is stored in the corresponding pixel of the output raster. For example, one might weight a 5x5 region centered at a pixel according to a 2d Gaussian to effect a blurring of the input raster. One might consider this roughly equivalent to a 2d convolution operation.
Note: Map algebra operations work only on TiledRasterLayer
s,
and if a local operation requires multiple inputs, those inputs must
have the same layout and projection.
Note: Throughout this guide, this .lookup(0, 0)[0].cells
is used
on the resulting layer. This call simply retrieves the numpy array of
the first tile within the layer.
Local Operations¶
Local operations on TiledRasterLayer
s can use int
s,
float
s, or other TiledRasterLayer
s. +
, -
, *
, and
/
are all of the local operations that currently supported.
Pyramid
s can also be used in local operations. The types that can
be used in local operations with Pyramid
s are: int
s,
float
s, TiledRasterLayer
s, and other Pyramid
s.
Note: Like with TiledRasterLayer
, performing calculations on
multiple Pyramid
s or TiledRasterLayer
s means they must all
have the same layout and projection.
Focal Operations¶
Focal operations are performed in GeoPySpark by executing a given
operation on a neighborhood throughout each tile in the layer. One can
select a neighborhood to use from the Neighborhood
enum class.
Likewise, an operation can be choosen from the enum class,
Operation
.
Mean¶
Median¶
Mode¶
Sum¶
Standard Deviation¶
Min¶
Max¶
Slope¶
Aspect¶
Miscellaneous Raster Operations¶
There are other means to extract information from rasters and to create rasters that need to be presented. These are polygonal summaries, cost distance, and rasterization.
Polygonal Summary Methods¶
In addition to local and focal operations, polygonal summaries can also
be performed on TiledRasterLayer
s. These are operations that are
executed in the areas that intersect a given geometry and the layer.
Note: It is important the given geometry is in the same projection as the layer. If they are not, then either incorrect and/or only partial results will be returned.
Polygonal Min¶
Polygonal Max¶
Polygonal Sum¶
Polygonal Mean¶
Cost Distance¶
cost_distance
is an iterative method for approximating the weighted
distance from a raster cell to a given geometry. The cost_distance
function takes in a geometry and a “friction layer” which essentially
describes how difficult it is to traverse each raster cell. Cells that
fall within the geometry have a final cost of zero, while friction cells
that contain noData values will correspond to noData values in the final
result. All other cells have a value that describes the minimum cost of
traversing from that cell to the geometry. If the friction layer is
uniform, this function approximates the Euclidean distance, modulo some
scalar value.
Rasterization¶
It may be desirable to convert vector data into a raster layer. For
this, we provide the rasterize
function, which determines the set of
pixel values covered by each vector element, and assigns a supplied
value to that set of pixels in a target raster. If, for example, one had
a set of polygons representing counties in the US, and a value for, say,
the median income within each county, a raster could be made
representing these data.
GeoPySpark’s rasterize
function takes a list of any number of
Shapely geometries, converts them to rasters, tiles the rasters to a
given layout, and then produces a TiledRasterLayer
with these tiled
values.
Rasterize MultiPolygons¶
Rasterize LineStrings¶
Rasterize Polygons and LineStrings¶
Ingesting an Image¶
This example shows how to ingest a grayscale image and save the results locally. It is assumed that you have already read through the documentation on GeoPySpark before beginning this tutorial.
Getting the Data¶
Before we can begin with the ingest, we must first download the data
from S3. This curl command will download a file from S3 and save it to
your /tmp
direcotry. The file being downloaded comes from the
Shuttle Radar Topography Mission
(SRTM) dataset, and
contains elevation data on the east coast of Sri Lanka.
A side note: Files can be retrieved directly from S3 using the methods shown in this tutorial. However, this could not be done in this instance due to permission requirements needed to access the file.
What is an Ingest?¶
Before continuing on, it would be best to briefly discuss what an ingest
actually is. When data is acquired, it may cover an arbitrary spatial
extent in an arbitrary projection. This data needs to be regularized to
some expected layout and cut into tiles. After this step, we will
possess a TiledRasterLayer
that can be analyzed and saved for later
use. For more information on layers and the data they hold, see the
layers guide.
The Code¶
With our file downloaded we can begin the ingest.
Setting Up the SparkContext¶
The first thing one needs to do when using GeoPySpark is to setup
SparkContext
. Because GeoPySpark is backed by Spark, the pysc
is
needed to initialize our starting classes.
For those that are already familiar with Spark, you may already know
there are multiple ways to create a SparkContext
. When working with
GeoPySpark, it is advised to create this instance via SparkConf
.
There are numerous settings for SparkConf
, and some have to be
set a certain way in order for GeoPySpark to work. Thus,
geopyspark_conf
was created as way for a user to set the basic
parameters without having to worry about setting the other, required
fields.
Reading in the Data¶
After the creation of pysc
, we can now read in the data. For this
example, we will be reading in a single GeoTiff that contains spatial
data. Hence, why we set the layer_type
to LayerType.SPATIAL
.
Tiling the Data¶
It is now time to format the data within the layer to our desired
layout. The aptly named, tile_to_layout
, method will cut and arrange
the rasters in the layer to the layout of our choosing. This results in
us getting a new class instance of TiledRasterLayer
. For this
example, we will be tiling to a GlobalLayout
.
With our tiled data, we might like to make a tile server from it and
show it in on a map at some point. Therefore, we have to make sure that
the tiles within the layer are in the right projection. We can do this
by setting the target_crs
parameter.
Pyramiding the Data¶
Now it’s time to pyramid! With our reprojected data, we will create an
instance of Pyramid
that contains 12 TiledRasterLayer
s. Each
one having it’s own zoom_level
from 11 to 0.
Saving the Pyramid Locally¶
To save all of the TiledRasterLayer
s within pyramid_layer
, we
just have to loop through values of pyramid_layer.level
and write
each layer locally.
Reading in Sentinel-2 Images¶
Sentinel-2 is an observation mission developed by the European Space
Agency to monitor the surface of the Earth official
website.
Sets of images are taken of the surface where each image corresponds to
a specific wavelength. These images can provide useful data for a wide
variety of industries, however, the format they are stored in can prove
difficult to work with. This being JPEG 2000
(file extension
.jp2
), an image compression format for JPEGs that allows for
improved quality and compression ratio.
Why Use GeoPySpark¶
There are few libraries and/or applications that can work with
jp2
s and big data, which can make processing large amounts of
sentinel data difficult. However, by using GeoPySpark in conjunction
with the tools available in Python, we are able to read in and work with
large sets of sentinel imagery.
Getting the Data¶
Before we can start this tutorial, we will need to get the sentinel images. All sentinel data can be found on Amazon’s S3 service, and we will be downloading it straight from there.
We will download three different jp2
s that represent the same area
and time in different wavelengths: Aerosol detection (443 nm), Water
vapor (945 nm), and Cirrus (1375 nm). These bands are chosen because
they are all in the same 60m resolution. The tiles we will be working
with cover the eastern coast of Corsica taken on January 4th, 2017.
For more information on the way the data is stored on S3, please see this link.
The Code¶
Now that we have the files, we can begin to read them into GeoPySpark.
Reading in the JPEG 2000’s¶
rasterio
, being backed by GDAL, allows us to read in the jp2
s.
Once they are read in, we will then combine the three seperate numpy
arrays into one. This combined array represents a single, multiband
raster.
Creating the RDD¶
With our raster data in hand, we can how begin the creation of a Python
RDD
. Please see the core concepts guide
for more information on what the following instances represent.
You may have noticed in the above code that we did something weird to
get the CRS
from the rasterio file. This had to be done because the
way rasterio formats the projection of the read in rasters is not
compatible with how GeoPySpark expects the CRS
to be in. Thus, we
had to do a bit of extra work to get it into the correct state
Creating the Layer¶
From the RDD
, we can now create a RasterLayer
using the
from_numpy_rdd
method.
Where to Go From Here¶
By creating a RasterLayer
, we can now work with and analyze the data
within it. If you wish to know more about these operations, please see
the following guides: Layers Guide,
[map-algebra-guide], [visulation-guide], and the [catalog-guide].
geopyspark package¶
-
geopyspark.
geopyspark_conf
(master=None, appName=None, additional_jar_dirs=[])¶ Construct the base SparkConf for use with GeoPySpark. This configuration object may be used as is , or may be adjusted according to the user’s needs.
Note
The GEOPYSPARK_JARS_PATH environment variable may contain a colon-separated list of directories to search for JAR files to make available via the SparkConf.
Parameters: - master (string) – The master URL to connect to, such as “local” to run locally with one thread, “local[4]” to run locally with 4 cores, or “spark://master:7077” to run on a Spark standalone cluster.
- appName (string) – The name of the application, as seen in the Spark console
- additional_jar_dirs (list, optional) – A list of directory locations that might contain JAR files needed by the current script. Already includes $(cwd)/jars.
Returns: SparkConf
-
class
geopyspark.
Tile
¶ Represents a raster in GeoPySpark.
Note
All rasters in GeoPySpark are represented as having multiple bands, even if the original raster just contained one.
Parameters: - cells (nd.array) – The raster data itself. It is contained within a NumPy array.
- data_type (str) – The data type of the values within
data
if they were in Scala. - no_data_value – The value that represents no data value in the raster. This can be represented by a variety of types depending on the value type of the raster.
-
cells
¶ nd.array – The raster data itself. It is contained within a NumPy array.
-
data_type
¶ str – The data type of the values within
data
if they were in Scala.
-
no_data_value
¶ The value that represents no data value in the raster. This can be represented by a variety of types depending on the value type of the raster.
-
cell_type
¶ Alias for field number 1
-
cells
Alias for field number 0
-
count
(value) → integer -- return number of occurrences of value¶
-
static
dtype_to_cell_type
(dtype)¶ Converts a
np.dtype
to the corresponding GeoPySparkcell_type
.Note
bool
,complex64
,complex128
, andcomplex256
, are currently not supportednp.dtype
s.Parameters: dtype (np.dtype) – The dtype
of the numpy array.Returns: str. The GeoPySpark cell_type
equivalent of thedtype
.Raises: TypeError
– If the givendtype
is not a supported data type.
-
classmethod
from_numpy_array
(numpy_array, no_data_value=None)¶ Creates an instance of
Tile
from a numpy array.Parameters: - numpy_array (np.array) –
The numpy array to be used to represent the cell values of the
Tile
.Note
GeoPySpark does not support arrays with the following data types:
bool
,complex64
,complex128
, andcomplex256
. - no_data_value (optional) – The value that represents no data value in the raster.
This can be represented by a variety of types depending on the value type of
the raster. If not given, then the value will be
None
.
Returns: - numpy_array (np.array) –
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
no_data_value
Alias for field number 2
-
class
geopyspark.
Extent
¶ The “bounding box” or geographic region of an area on Earth a raster represents.
Parameters: - xmin (float) – The minimum x coordinate.
- ymin (float) – The minimum y coordinate.
- xmax (float) – The maximum x coordinate.
- ymax (float) – The maximum y coordinate.
-
xmin
¶ float – The minimum x coordinate.
-
ymin
¶ float – The minimum y coordinate.
-
xmax
¶ float – The maximum x coordinate.
-
ymax
¶ float – The maximum y coordinate.
-
count
(value) → integer -- return number of occurrences of value¶
-
classmethod
from_polygon
(polygon)¶ Creates a new instance of
Extent
from a Shapely Polygon.The new
Extent
will contain the min and max coordinates of the Polygon; regardless of the Polygon’s shape.Parameters: polygon (shapely.geometry.Polygon) – A Shapely Polygon. Returns: Extent
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
to_polygon
¶ Converts this instance to a Shapely Polygon.
The resulting Polygon will be in the shape of a box.
Returns: shapely.geometry.Polygon
-
xmax
Alias for field number 2
-
xmin
Alias for field number 0
-
ymax
Alias for field number 3
-
ymin
Alias for field number 1
-
class
geopyspark.
ProjectedExtent
¶ Describes both the area on Earth a raster represents in addition to its CRS.
Parameters: - extent (
Extent
) – The area the raster represents. - epsg (int, optional) – The EPSG code of the CRS.
- proj4 (str, optional) – The Proj.4 string representation of the CRS.
-
epsg
¶ int, optional – The EPSG code of the CRS.
-
proj4
¶ str, optional – The Proj.4 string representation of the CRS.
Note
Either
epsg
orproj4
must be defined.-
count
(value) → integer -- return number of occurrences of value¶
-
epsg
Alias for field number 1
-
extent
Alias for field number 0
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
proj4
Alias for field number 2
- extent (
-
class
geopyspark.
TemporalProjectedExtent
¶ Describes the area on Earth the raster represents, its CRS, and the time the data was collected.
Parameters: - extent (
Extent
) – The area the raster represents. - instant (
datetime.datetime
) – The time stamp of the raster. - epsg (int, optional) – The EPSG code of the CRS.
- proj4 (str, optional) – The Proj.4 string representation of the CRS.
-
instant
¶ datetime.datetime
– The time stamp of the raster.
-
epsg
¶ int, optional – The EPSG code of the CRS.
-
proj4
¶ str, optional – The Proj.4 string representation of the CRS.
Note
Either
epsg
orproj4
must be defined.-
count
(value) → integer -- return number of occurrences of value¶
-
epsg
Alias for field number 2
-
extent
Alias for field number 0
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
instant
Alias for field number 1
-
proj4
Alias for field number 3
- extent (
-
class
geopyspark.
SpatialKey
(col, row)¶ -
col
¶ Alias for field number 0
-
count
(value) → integer -- return number of occurrences of value¶
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
row
¶ Alias for field number 1
-
-
class
geopyspark.
SpaceTimeKey
(col, row, instant)¶ -
col
¶ Alias for field number 0
-
count
(value) → integer -- return number of occurrences of value¶
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
instant
¶ Alias for field number 2
-
row
¶ Alias for field number 1
-
-
class
geopyspark.
Metadata
(bounds, crs, cell_type, extent, layout_definition)¶ Information of the values within a
RasterLayer
orTiledRasterLayer
. This data pertains to the layout and other attributes of the data within the classes.Parameters: - bounds (
Bounds
) – TheBounds
of the values in the class. - crs (str or int) – The
CRS
of the data. Can either be the EPSG code, well-known name, or a PROJ.4 projection string. - cell_type (str or
CellType
) – The data type of the cells of the rasters. - extent (
Extent
) – TheExtent
that covers the all of the rasters. - layout_definition (
LayoutDefinition
) – TheLayoutDefinition
of all rasters.
-
crs
¶ str or int – The CRS of the data. Can either be the EPSG code, well-known name, or a PROJ.4 projection string.
-
cell_type
¶ str – The data type of the cells of the rasters.
-
no_data_value
¶ int or float or None – The noData value of the rasters within the layer. This can either be
None
, anint
, or afloat
depending on thecell_type
.
-
tile_layout
¶ TileLayout
– TheTileLayout
that describes how the rasters are orginized.
-
layout_definition
¶ LayoutDefinition
– TheLayoutDefinition
of all rasters.
-
classmethod
from_dict
(metadata_dict)¶ Creates
Metadata
from a dictionary.Parameters: metadata_dict (dict) – The Metadata
of aRasterLayer
orTiledRasterLayer
instance that is indict
form.Returns: Metadata
-
to_dict
()¶ Converts this instance to a
dict
.Returns: dict
- bounds (
-
class
geopyspark.
TileLayout
(layoutCols, layoutRows, tileCols, tileRows)¶ -
count
(value) → integer -- return number of occurrences of value¶
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
layoutCols
¶ Alias for field number 0
-
layoutRows
¶ Alias for field number 1
-
tileCols
¶ Alias for field number 2
-
tileRows
¶ Alias for field number 3
-
-
class
geopyspark.
GlobalLayout
(tile_size, zoom, threshold)¶ -
count
(value) → integer -- return number of occurrences of value¶
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
threshold
¶ Alias for field number 2
-
tile_size
¶ Alias for field number 0
-
zoom
¶ Alias for field number 1
-
-
class
geopyspark.
LocalLayout
¶ TileLayout type that snaps the layer extent.
When passed in place of LayoutDefinition it signifies that a LayoutDefinition instances should be constructed over the envelope of the layer pixels with given tile size. Resulting TileLayout will match the cell resolution of the source rasters.
Parameters: - tile_size (int, optional) – The number of columns and row pixels in each tile. If this
is
None
, then the sizes of each tile will be set usingtile_cols
andtile_rows
. - tile_cols (int, optional) – The number of column pixels in each tile. This supersedes
tile_size
. Meaning if this andtile_size
are set, then this will be used for the number of colunn pixles. IfNone
, then the number of column pixels will default to 256. - tile_rows (int, optional) – The number of rows pixels in each tile. This supersedes
tile_size
. Meaning if this andtile_size
are set, then this will be used for the number of row pixles. IfNone
, then the number of row pixels will default to 256.
-
tile_cols
¶ int – The number of column pixels in each tile
-
tile_rows
¶ int – The number of rows pixels in each tile. This supersedes
-
count
(value) → integer -- return number of occurrences of value¶
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
tile_cols
Alias for field number 0
-
tile_rows
Alias for field number 1
- tile_size (int, optional) – The number of columns and row pixels in each tile. If this
is
-
class
geopyspark.
LayoutDefinition
(extent, tileLayout)¶ -
count
(value) → integer -- return number of occurrences of value¶
-
extent
¶ Alias for field number 0
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
tileLayout
¶ Alias for field number 1
-
-
class
geopyspark.
Bounds
¶ Represents the grid that covers the area of the rasters in a Layer on a grid.
Parameters: - minKey (
SpatialKey
orSpaceTimeKey
) – The smallestSpatialKey
orSpaceTimeKey
. - minKey – The largest
SpatialKey
orSpaceTimeKey
.
Returns: -
count
(value) → integer -- return number of occurrences of value¶
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
maxKey
¶ Alias for field number 1
-
minKey
¶ Alias for field number 0
- minKey (
-
geopyspark.
RasterizerOptions
¶ alias of
RasterizeOption
-
geopyspark.
read_layer_metadata
(uri, layer_name, layer_zoom)¶ Reads the metadata from a saved layer without reading in the whole layer.
Parameters: - uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
- layer_name (str) – The name of the GeoTrellis catalog to be read from.
- layer_zoom (int) – The zoom level of the layer that is to be read.
Returns:
-
geopyspark.
read_value
(uri, layer_name, layer_zoom, col, row, zdt=None, store=None)¶ Reads a single
Tile
from a GeoTrellis catalog. Unlike other functions in this module, this will not return aTiledRasterLayer
, but rather a GeoPySpark formatted raster.Note
When requesting a tile that does not exist,
None
will be returned.Parameters: - uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
- layer_name (str) – The name of the GeoTrellis catalog to be read from.
- layer_zoom (int) – The zoom level of the layer that is to be read.
- col (int) – The col number of the tile within the layout. Cols run east to west.
- row (int) – The row number of the tile within the layout. Row run north to south.
- zdt (
datetime.datetime
) – The time stamp of the tile if the data is spatial-temporal. This is represented as adatetime.datetime.
instance. The default value is,None
. IfNone
, then only the spatial area will be queried. - store (str or
AttributeStore
, optional) –AttributeStore
instance or URI for layer metadata lookup.
Returns:
-
geopyspark.
query
(uri, layer_name, layer_zoom=None, query_geom=None, time_intervals=None, query_proj=None, num_partitions=None, store=None)¶ Queries a single, zoom layer from a GeoTrellis catalog given spatial and/or time parameters.
Note
The whole layer could still be read in if
intersects
and/ortime_intervals
have not been set, or if the querried region contains the entire layer.Parameters: - layer_type (str or
LayerType
) – What the layer type of the geotiffs are. This is represented by either constants withinLayerType
or by a string. - uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
- layer_name (str) – The name of the GeoTrellis catalog to be querried.
- layer_zoom (int, optional) – The zoom level of the layer that is to be querried.
If
None
, then thelayer_zoom
will be set to 0. - query_geom (bytes or shapely.geometry or
Extent
, Optional) –The desired spatial area to be returned. Can either be a string, a shapely geometry, or instance of
Extent
, or a WKB verson of the geometry.Note
Not all shapely geometires are supported. The following is are the types that are supported: * Point * Polygon * MultiPolygon
Note
Only layers that were made from spatial, singleband GeoTiffs can query a
Point
. All other types are restricted toPolygon
andMulitPolygon
.If not specified, then the entire layer will be read.
- time_intervals (
[datetime.datetime]
, optional) – A list of the time intervals to query. This parameter is only used when querying spatial-temporal data. The default value is,None
. IfNone
, then only the spatial area will be querried. - query_proj (int or str, optional) – The crs of the querried geometry if it is different
than the layer it is being filtered against. If they are different and this is not set,
then the returned
TiledRasterLayer
could contain incorrect values. IfNone
, then the geometry and layer are assumed to be in the same projection. - num_partitions (int, optional) – Sets RDD partition count when reading from catalog.
- store (str or
AttributeStore
, optional) –AttributeStore
instance or URI for layer metadata lookup.
Returns: - layer_type (str or
-
geopyspark.
write
(uri, layer_name, tiled_raster_layer, index_strategy=<IndexingMethod.ZORDER: 'zorder'>, time_unit=None, store=None)¶ Writes a tile layer to a specified destination.
Parameters: - uri (str) – The Uniform Resource Identifier used to point towards the desired location for the tile layer to written to. The shape of this string varies depending on backend.
- layer_name (str) – The name of the new, tile layer.
- layer_zoom (int) – The zoom level the layer should be saved at.
- tiled_raster_layer (
TiledRasterLayer
) – TheTiledRasterLayer
to be saved. - index_strategy (str or
IndexingMethod
) – The method used to orginize the saved data. Depending on the type of data within the layer, only certain methods are available. Can either be a string or aIndexingMethod
attribute. The default method used is,IndexingMethod.ZORDER
. - time_unit (str or
TimeUnit
, optional) – Which time unit should be used when saving spatial-temporal data. This controls the resolution of each index. Meaning, what time intervals are used to seperate each record. While this is set toNone
as default, it must be set if saving spatial-temporal data. Depending on the indexing method chosen, different time units are used. - store (str or
AttributeStore
, optional) –AttributeStore
instance or URI for layer metadata lookup.
-
class
geopyspark.
AttributeStore
(uri)¶ AttributeStore provides a way to read and write GeoTrellis layer attributes.
Internally all attribute values are stored as JSON, here they are exposed as dictionaries. Classes often stored have a
.from_dict
and.to_dict
methods to bridge the gap:import geopyspark as gps store = gps.AttributeStore("s3://azavea-datahub/catalog") hist = store.layer("us-nlcd2011-30m-epsg3857", zoom=7).read("histogram") hist = gps.Histogram.from_dict(hist)
-
class
Attributes
(store, layer_name, layer_zoom)¶ Accessor class for all attributes for a given layer
-
delete
(name)¶ Delete attribute by name
Parameters: name (str) – Attribute name
-
layer_metadata
()¶
-
read
(name)¶ Read layer attribute by name as a dict
Parameters: name (str) – Returns: Attribute value Return type: dict
-
write
(name, value)¶ Write layer attribute value as a dict
Parameters: - name (str) – Attribute name
- value (dict) – Attribute value
-
-
classmethod
AttributeStore.
build
(store)¶ Builds AttributeStore from URI or passes an instance through.
Parameters: uri (str or AttributeStore) – URI for AttributeStore object or instance. Returns: AttributeStore
-
classmethod
AttributeStore.
cached
(uri)¶ Returns cached version of AttributeStore for URI or creates one
-
AttributeStore.
contains
(name, zoom=None)¶ Checks if this store contains a layer metadata.
Parameters: - name (str) – Layer name
- zoom (int, optional) – Layer zoom
Returns: bool
-
AttributeStore.
delete
(name, zoom=None)¶ Delete layer and all its attributes
Parameters: - name (str) – Layer name
- zoom (int, optional) – Layer zoom
-
AttributeStore.
layer
(name, zoom=None)¶ Layer Attributes object for given layer :param name: Layer name :type name: str :param zoom: Layer zoom :type zoom: int, optional
Returns: Attributes
-
AttributeStore.
layers
()¶ List all layers Attributes objects
Returns: [:class:`~geopyspark.geotrellis.catalog.AttributeStore.Attributes`]
-
class
-
geopyspark.
get_colors_from_colors
(colors)¶ Returns a list of integer colors from a list of Color objects from the colortools package.
Parameters: colors ([colortools.Color]) – A list of color stops using colortools.Color Returns: [int]
-
geopyspark.
get_colors_from_matplotlib
(ramp_name, num_colors=256)¶ Returns a list of color breaks from the color ramps defined by Matplotlib.
Parameters: - ramp_name (str) – The name of a matplotlib color ramp. See the matplotlib documentation for a list of names and details on each color ramp.
- num_colors (int, optional) – The number of color breaks to derive from the named map.
Returns: [int]
-
class
geopyspark.
ColorMap
(cmap)¶ A class that wraps a GeoTrellis ColorMap class.
Parameters: cmap (py4j.java_gateway.JavaObject) – The JavaObject
that represents the GeoTrellis ColorMap.-
cmap
¶ py4j.java_gateway.JavaObject – The
JavaObject
that represents the GeoTrellis ColorMap.
-
classmethod
build
(breaks, colors=None, no_data_color=0, fallback=0, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>)¶ Given breaks and colors, build a
ColorMap
object.Parameters: - breaks (dict or list or
Histogram
) – If adict
then a mapping from tile values to colors, the latter represented as integers e.g., 0xff000080 is red at half opacity. If alist
then tile values that specify breaks in the color mapping. If aHistogram
then a histogram from which breaks can be derived. - colors (str or list, optional) – If a
str
then the name of a matplotlib color ramp. If alist
then either a list of colortoolsColor
objects or a list of integers containing packed RGBA values. IfNone
, then theColorMap
will be created from thebreaks
given. - no_data_color (int, optional) – A color to replace NODATA values with
- fallback (int, optional) – A color to replace cells that have no value in the mapping
- classification_strategy (str or
ClassificationStrategy
, optional) – A string giving the strategy for converting tile values to colors. e.g., ifClassificationStrategy.LESS_THAN_OR_EQUAL_TO
is specified, and the break map is {3: 0xff0000ff, 4: 0x00ff00ff}, then values up to 3 map to red, values from above 3 and up to and including 4 become green, and values over 4 become the fallback color.
Returns: ColorMap
- breaks (dict or list or
-
classmethod
from_break_map
(break_map, no_data_color=0, fallback=0, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>)¶ Converts a dictionary mapping from tile values to colors to a ColorMap.
Parameters: - break_map (dict) – A mapping from tile values to colors, the latter represented as integers e.g., 0xff000080 is red at half opacity.
- no_data_color (int, optional) – A color to replace NODATA values with
- fallback (int, optional) – A color to replace cells that have no value in the mapping
- classification_strategy (str or
ClassificationStrategy
, optional) – A string giving the strategy for converting tile values to colors. e.g., ifClassificationStrategy.LESS_THAN_OR_EQUAL_TO
is specified, and the break map is {3: 0xff0000ff, 4: 0x00ff00ff}, then values up to 3 map to red, values from above 3 and up to and including 4 become green, and values over 4 become the fallback color.
Returns: ColorMap
-
classmethod
from_colors
(breaks, color_list, no_data_color=0, fallback=0, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>)¶ Converts lists of values and colors to a
ColorMap
.Parameters: - breaks (list) – The tile values that specify breaks in the color mapping.
- color_list ([int]) – The colors corresponding to the values in the breaks list, represented as integers—e.g., 0xff000080 is red at half opacity.
- no_data_color (int, optional) – A color to replace NODATA values with
- fallback (int, optional) – A color to replace cells that have no value in the mapping
- classification_strategy (str or
ClassificationStrategy
, optional) – A string giving the strategy for converting tile values to colors. e.g., ifClassificationStrategy.LESS_THAN_OR_EQUAL_TO
is specified, and the break map is {3: 0xff0000ff, 4: 0x00ff00ff}, then values up to 3 map to red, values from above 3 and up to and including 4 become green, and values over 4 become the fallback color.
Returns: ColorMap
-
classmethod
from_histogram
(histogram, color_list, no_data_color=0, fallback=0, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>)¶ Converts a wrapped GeoTrellis histogram into a
ColorMap
.Parameters: - histogram (
Histogram
) – AHistogram
instance; specifies breaks - color_list ([int]) – The colors corresponding to the values in the breaks list, represented as integers e.g., 0xff000080 is red at half opacity.
- no_data_color (int, optional) – A color to replace NODATA values with
- fallback (int, optional) – A color to replace cells that have no value in the mapping
- classification_strategy (str or
ClassificationStrategy
, optional) – A string giving the strategy for converting tile values to colors. e.g., ifClassificationStrategy.LESS_THAN_OR_EQUAL_TO
is specified, and the break map is {3: 0xff0000ff, 4: 0x00ff00ff}, then values up to 3 map to red, values from above 3 and up to and including 4 become green, and values over 4 become the fallback color.
Returns: ColorMap
- histogram (
-
static
nlcd_colormap
()¶ Returns a color map for NLCD tiles.
Returns: ColorMap
-
-
class
geopyspark.
LayerType
¶ The type of the key within the tuple of the wrapped RDD.
-
SPACETIME
= 'spacetime'¶
-
SPATIAL
= 'spatial'¶
-
-
class
geopyspark.
IndexingMethod
¶ How the wrapped should be indexed when saved.
-
HILBERT
= 'hilbert'¶
-
ROWMAJOR
= 'rowmajor'¶
-
ZORDER
= 'zorder'¶
-
-
class
geopyspark.
ResampleMethod
¶ Resampling Methods.
-
AVERAGE
= 'Average'¶
-
BILINEAR
= 'Bilinear'¶
-
CUBIC_CONVOLUTION
= 'CubicConvolution'¶
-
CUBIC_SPLINE
= 'CubicSpline'¶
-
LANCZOS
= 'Lanczos'¶
-
MAX
= 'Max'¶
-
MEDIAN
= 'Median'¶
-
MIN
= 'Min'¶
-
MODE
= 'Mode'¶
-
NEAREST_NEIGHBOR
= 'NearestNeighbor'¶
-
-
class
geopyspark.
TimeUnit
¶ ZORDER time units.
-
DAYS
= 'days'¶
-
HOURS
= 'hours'¶
-
MILLIS
= 'millis'¶
-
MINUTES
= 'minutes'¶
-
MONTHS
= 'months'¶
-
SECONDS
= 'seconds'¶
-
YEARS
= 'years'¶
-
-
class
geopyspark.
Operation
¶ Focal opertions.
-
ASPECT
= 'Aspect'¶
-
MAX
= 'Max'¶
-
MEAN
= 'Mean'¶
-
MEDIAN
= 'Median'¶
-
MIN
= 'Min'¶
-
MODE
= 'Mode'¶
-
SLOPE
= 'Slope'¶
-
STANDARD_DEVIATION
= 'StandardDeviation'¶
-
SUM
= 'Sum'¶
-
-
class
geopyspark.
Neighborhood
¶ Neighborhood types.
-
ANNULUS
= 'Annulus'¶
-
CIRCLE
= 'Circle'¶
-
NESW
= 'Nesw'¶
-
SQUARE
= 'Square'¶
-
WEDGE
= 'Wedge'¶
-
-
class
geopyspark.
ClassificationStrategy
¶ Classification strategies for color mapping.
-
EXACT
= 'Exact'¶
-
GREATER_THAN
= 'GreaterThan'¶
-
GREATER_THAN_OR_EQUAL_TO
= 'GreaterThanOrEqualTo'¶
-
LESS_THAN
= 'LessThan'¶
-
LESS_THAN_OR_EQUAL_TO
= 'LessThanOrEqualTo'¶
-
-
class
geopyspark.
CellType
¶ Cell types.
-
BOOL
= 'bool'¶
-
BOOLRAW
= 'boolraw'¶
-
FLOAT32
= 'float32'¶
-
FLOAT32RAW
= 'float32raw'¶
-
FLOAT64
= 'float64'¶
-
FLOAT64RAW
= 'float64raw'¶
-
INT16
= 'int16'¶
-
INT16RAW
= 'int16raw'¶
-
INT32
= 'int32'¶
-
INT32RAW
= 'int32raw'¶
-
INT8
= 'int8'¶
-
INT8RAW
= 'int8raw'¶
-
UINT16
= 'uint16'¶
-
UINT16RAW
= 'uint16raw'¶
-
UINT8
= 'uint8'¶
-
UINT8RAW
= 'uint8raw'¶
-
-
class
geopyspark.
ColorRamp
¶ ColorRamp names.
-
BLUE_TO_ORANGE
= 'BlueToOrange'¶
-
BLUE_TO_RED
= 'BlueToRed'¶
-
CLASSIFICATION_BOLD_LAND_USE
= 'ClassificationBoldLandUse'¶
-
CLASSIFICATION_MUTED_TERRAIN
= 'ClassificationMutedTerrain'¶
-
COOLWARM
= 'CoolWarm'¶
-
GREEN_TO_RED_ORANGE
= 'GreenToRedOrange'¶
-
HEATMAP_BLUE_TO_YELLOW_TO_RED_SPECTRUM
= 'HeatmapBlueToYellowToRedSpectrum'¶
-
HEATMAP_DARK_RED_TO_YELLOW_WHITE
= 'HeatmapDarkRedToYellowWhite'¶
-
HEATMAP_LIGHT_PURPLE_TO_DARK_PURPLE_TO_WHITE
= 'HeatmapLightPurpleToDarkPurpleToWhite'¶
-
HEATMAP_YELLOW_TO_RED
= 'HeatmapYellowToRed'¶
-
Hot
= 'Hot'¶
-
INFERNO
= 'Inferno'¶
-
LIGHT_TO_DARK_GREEN
= 'LightToDarkGreen'¶
-
LIGHT_TO_DARK_SUNSET
= 'LightToDarkSunset'¶
-
LIGHT_YELLOW_TO_ORANGE
= 'LightYellowToOrange'¶
-
MAGMA
= 'Magma'¶
-
PLASMA
= 'Plasma'¶
-
VIRIDIS
= 'Viridis'¶
-
-
geopyspark.
cost_distance
(friction_layer, geometries, max_distance)¶ Performs cost distance of a TileLayer.
Parameters: - friction_layer (
TiledRasterLayer
) –TiledRasterLayer
of a friction surface to traverse. - geometries (list) –
A list of shapely geometries to be used as a starting point.
Note
All geometries must be in the same CRS as the TileLayer.
- max_distance (int or float) – The maximum cost that a path may reach before the operation.
stops. This value can be an
int
orfloat
.
Returns: - friction_layer (
-
geopyspark.
euclidean_distance
(geometry, source_crs, zoom, cell_type=<CellType.FLOAT64: 'float64'>)¶ Calculates the Euclidean distance of a Shapely geometry.
Parameters: - geometry (shapely.geometry) – The input geometry to compute the Euclidean distance for.
- source_crs (str or int) – The CRS of the input geometry.
- zoom (int) – The zoom level of the output raster.
- cell_type (str or
CellType
, optional) – The data type of the cells for the new layer. If not specified, thenCellType.FLOAT64
is used.
Note
This function may run very slowly for polygonal inputs if they cover many cells of the output raster.
Returns: TiledRasterLayer
-
geopyspark.
hillshade
(tiled_raster_layer, band=0, azimuth=315.0, altitude=45.0, z_factor=1.0)¶ Computes Hillshade (shaded relief) from a raster.
The resulting raster will be a shaded relief map (a hill shading) based on the sun altitude, azimuth, and the z factor. The z factor is a conversion factor from map units to elevation units.
Returns a raster of ShortConstantNoDataCellType.
For descriptions of parameters, please see Esri Desktop’s description of Hillshade.
Parameters: - tiled_raster_layer (
TiledRasterLayer
) – The base layer that contains the rasters used to compute the hillshade. - band (int, optional) – The band of the raster to base the hillshade calculation on. Default is 0.
- azimuth (float, optional) – The azimuth angle of the source of light. Default value is 315.0.
- altitude (float, optional) – The angle of the altitude of the light above the horizon. Default is 45.0.
- z_factor (float, optional) – How many x and y units in a single z unit. Default value is 1.0.
Returns: - tiled_raster_layer (
-
class
geopyspark.
Histogram
(scala_histogram)¶ A wrapper class for a GeoTrellis Histogram.
The underlying histogram is produced from the values within a
TiledRasterLayer
. These values represented by the histogram can either beInt
orFloat
depending on the data type of the cells in the layer.Parameters: scala_histogram (py4j.JavaObject) – An instance of the GeoTrellis histogram. -
scala_histogram
¶ py4j.JavaObject – An instance of the GeoTrellis histogram.
-
bin_counts
()¶ Returns a list of tuples where the key is the bin label value and the value is the label’s respective count.
Returns: [(int, int)] or [(float, int)]
-
bucket_count
()¶ Returns the number of buckets within the histogram.
Returns: int
-
cdf
()¶ Returns the cdf of the distribution of the histogram.
Returns: [(float, float)]
-
classmethod
from_dict
(value)¶ Encodes histogram as a dictionary
-
item_count
(item)¶ Returns the total number of times a given item appears in the histogram.
Parameters: item (int or float) – The value whose occurences should be counted. Returns: The total count of the occurences of item
in the histogram.Return type: int
-
max
()¶ The largest value of the histogram.
This will return either an
int
orfloat
depedning on the type of values within the histogram.Returns: int or float
-
mean
()¶ Determines the mean of the histogram.
Returns: float
-
median
()¶ Determines the median of the histogram.
Returns: float
-
merge
(other_histogram)¶ Merges this instance of
Histogram
with another. The resultingHistogram
will contain values from both ``Histogram``sParameters: other_histogram ( Histogram
) – TheHistogram
that should be merged with this instance.Returns: Histogram
-
min
()¶ The smallest value of the histogram.
This will return either an
int
orfloat
depedning on the type of values within the histogram.Returns: int or float
-
min_max
()¶ The largest and smallest values of the histogram.
This will return either an
int
orfloat
depedning on the type of values within the histogram.Returns: (int, int) or (float, float)
-
mode
()¶ Determines the mode of the histogram.
This will return either an
int
orfloat
depedning on the type of values within the histogram.Returns: int or float
-
quantile_breaks
(num_breaks)¶ Returns quantile breaks for this Layer.
Parameters: num_breaks (int) – The number of breaks to return. Returns: [int]
-
to_dict
()¶ Encodes histogram as a dictionary
Returns: dict
-
-
class
geopyspark.
RasterLayer
(layer_type, srdd)¶ A wrapper of a RDD that contains GeoTrellis rasters.
Represents a layer that wraps a RDD that contains
(K, V)
. WhereK
is eitherProjectedExtent
orTemporalProjectedExtent
depending on thelayer_type
of the RDD, andV
being aTile
.The data held within this layer has not been tiled. Meaning the data has yet to be modified to fit a certain layout. See raster_rdd for more information.
Parameters: - layer_type (str or
LayerType
) – What the layer type of the geotiffs are. This is represented by either constants withinLayerType
or by a string. - srdd (py4j.java_gateway.JavaObject) – The coresponding Scala class. This is what allows
RasterLayer
to access the various Scala methods.
-
pysc
¶ pyspark.SparkContext – The
SparkContext
being used this session.
-
srdd
¶ py4j.java_gateway.JavaObject – The coresponding Scala class. This is what allows
RasterLayer
to access the various Scala methods.
-
bands
(band)¶ Select a subsection of bands from the
Tile
s within the layer.Note
There could be potential high performance cost if operations are performed between two sub-bands of a large data set.
Note
Due to the natue of GeoPySpark’s backend, if selecting a band that is out of bounds then the error returned will be a
py4j.protocol.Py4JJavaError
and not a normal Python error.Parameters: band (int or tuple or list or range) – The band(s) to be selected from the Tile
s. Can either be a single int, or a collection of ints.Returns: RasterLayer
with the selected bands.
-
cache
()¶ Persist this RDD with the default storage level (C{MEMORY_ONLY}).
-
collect_keys
()¶ Returns a list of all of the keys in the layer.
Note
This method should only be called on layers with a smaller number of keys, as a large number could cause memory issues.
Returns: [:obj:`~geopyspark.geotrellis.SpatialKey`]
or[:ob:`~geopyspark.geotrellis.SpaceTimeKey`]
-
collect_metadata
(layout=LocalLayout(tile_cols=256, tile_rows=256))¶ Iterate over the RDD records and generates layer metadata desribing the contained rasters.
- :param layout (
LayoutDefinition
or:GlobalLayout
or LocalLayout
, optional):- Target raster layout for the tiling operation.
Returns: Metadata
- :param layout (
-
convert_data_type
(new_type, no_data_value=None)¶ Converts the underlying, raster values to a new
CellType
.Parameters: - new_type (str or
CellType
) – The data type the cells should be to converted to. - no_data_value (int or float, optional) – The value that should be marked as NoData.
Returns: Raises: ValueError
– Ifno_data_value
is set and thenew_type
contains raw values.ValueError
– Ifno_data_value
is set andnew_type
is a boolean.
- new_type (str or
-
count
()¶ Returns how many elements are within the wrapped RDD.
Returns: The number of elements in the RDD. Return type: Int
-
classmethod
from_numpy_rdd
(layer_type, numpy_rdd)¶ Create a
RasterLayer
from a numpy RDD.Parameters: - layer_type (str or
LayerType
) – What the layer type of the geotiffs are. This is represented by either constants withinLayerType
or by a string. - numpy_rdd (pyspark.RDD) – A PySpark RDD that contains tuples of either
ProjectedExtent
s orTemporalProjectedExtent
s and rasters that are represented by a numpy array.
Returns: - layer_type (str or
-
getNumPartitions
()¶ Returns the number of partitions set for the wrapped RDD.
Returns: The number of partitions. Return type: Int
-
get_class_histogram
()¶ Creates a
Histogram
of integer values. Suitable for classification rasters with limited number values. If only single band is present histogram is returned directly.Returns: Histogram
or [Histogram
]
-
get_histogram
()¶ Creates a
Histogram
for each band in the layer. If only single band is present histogram is returned directly.Returns: Histogram
or [Histogram
]
-
get_min_max
()¶ Returns the maximum and minimum values of all of the rasters in the layer.
Returns: (float, float)
-
get_quantile_breaks
(num_breaks)¶ Returns quantile breaks for this Layer.
Parameters: num_breaks (int) – The number of breaks to return. Returns: [float]
-
get_quantile_breaks_exact_int
(num_breaks)¶ Returns quantile breaks for this Layer. This version uses the
FastMapHistogram
, which counts exact integer values. If your layer has too many values, this can cause memory errors.Parameters: num_breaks (int) – The number of breaks to return. Returns: [int]
-
layer_type
-
map_cells
(func)¶ Maps over the cells of each
Tile
within the layer with a given function.Note
This operation first needs to deserialize the wrapped
RDD
into Python and then serialize theRDD
back into aTiledRasterRDD
once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.Parameters: func (cells, nd => cells) – A function that takes two arguements: cells
andnd
. Wherecells
is the numpy array andnd
is theno_data_value
of theTile
. It returnscells
which are the new cells values of theTile
represented as a numpy array.Returns: RasterLayer
-
map_tiles
(func)¶ Maps over each
Tile
within the layer with a given function.Note
This operation first needs to deserialize the wrapped
RDD
into Python and then serialize theRDD
back into aRasterRDD
once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.Parameters: func ( Tile
=>Tile
) – A function that takes aTile
and returns aTile
.Returns: RasterLayer
-
persist
(storageLevel=StorageLevel(False, True, False, False, 1))¶ Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).
-
pysc
-
reclassify
(value_map, data_type, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>, replace_nodata_with=None)¶ Changes the cell values of a raster based on how the data is broken up.
Parameters: - value_map (dict) – A
dict
whose keys represent values where a break should occur and its values are the new value the cells within the break should become. - data_type (type) – The type of the values within the rasters. Can either be int or float.
- classification_strategy (str or
ClassificationStrategy
, optional) – How the cells should be classified along the breaks. If unspecified, thenClassificationStrategy.LESS_THAN_OR_EQUAL_TO
will be used. - replace_nodata_with (data_type, optional) – When remapping values, nodata values must be treated separately. If nodata values are intended to be replaced during the reclassify, this variable should be set to the intended value. If unspecified, nodata values will be preserved.
Note
NoData symbolizes a different value depending on if
data_type
is int or float. For int, the constantNO_DATA_INT
can be used which represents the NoData value for int in GeoTrellis. For float,float('nan')
is used to represent NoData.Returns: RasterLayer
- value_map (dict) – A
-
reproject
(target_crs, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)¶ Reproject rasters to
target_crs
. The reproject does not sample past tile boundary.Parameters: - target_crs (str or int) – Target CRS of reprojection. Either EPSG code, well-known name, or a PROJ.4 string.
- resample_method (str or
ResampleMethod
, optional) – The resample method to use for the reprojection. If none is specified, thenResampleMethods.NEAREST_NEIGHBOR
is used.
Returns:
-
srdd
-
tile_to_layout
(layout=LocalLayout(tile_cols=256, tile_rows=256), target_crs=None, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)¶ Cut tiles to layout and merge overlapping tiles. This will produce unique keys.
- :param layout (
Metadata
or:TiledRasterLayer
or LayoutDefinition
orGlobalLayout
orLocalLayout
, optional):Target raster layout for the tiling operation.
Parameters: - target_crs (str or int, optional) – Target CRS of reprojection. Either EPSG code,
well-known name, or a PROJ.4 string. If
None
, no reproject will be perfomed. - resample_method (str or
ResampleMethod
, optional) – The cell resample method to used during the tiling operation. Default is``ResampleMethods.NEAREST_NEIGHBOR``.
Returns: - :param layout (
-
to_geotiff_rdd
(storage_method=<StorageMethod.STRIPED: 'Striped'>, rows_per_strip=None, tile_dimensions=(256, 256), compression=<Compression.NO_COMPRESSION: 'NoCompression'>, color_space=<ColorSpace.BLACK_IS_ZERO: 1>, color_map=None, head_tags=None, band_tags=None)¶ Converts the rasters within this layer to GeoTiffs which are then converted to bytes. This is returned as a
RDD[(K, bytes)]
. WhereK
is eitherProjectedExtent
orTemporalProjectedExtent
.Parameters: - storage_method (str or
StorageMethod
, optional) – How the segments within the GeoTiffs should be arranged. Default isStorageMethod.STRIPED
. - rows_per_strip (int, optional) – How many rows should be in each strip segment of the
GeoTiffs if
storage_method
isStorageMethod.STRIPED
. IfNone
, then the strip size will default to a value that is 8K or less. - tile_dimensions ((int, int), optional) – The length and width for each tile segment of the GeoTiff
if
storage_method
isStorageMethod.TILED
. IfNone
then the default size is(256, 256)
. - compression (str or
Compression
, optional) – How the data should be compressed. Defaults toCompression.NO_COMPRESSION
. - color_space (str or
ColorSpace
, optional) – How the colors should be organized in the GeoTiffs. Defaults toColorSpace.BLACK_IS_ZERO
. - color_map (
ColorMap
, optional) – AColorMap
instance used to color the GeoTiffs to a different gradient. - head_tags (dict, optional) – A
dict
where each key and value is astr
. - band_tags (list, optional) – A
list
ofdict
s where each key and value is astr
. - Note – For more information on the contents of the tags, see www.gdal.org/gdal_datamodel.html
Returns: RDD[(K, bytes)]
- storage_method (str or
-
to_numpy_rdd
()¶ Converts a
RasterLayer
to a numpy RDD.Note
Depending on the size of the data stored within the RDD, this can be an exspensive operation and should be used with caution.
Returns: RDD
-
to_png_rdd
(color_map)¶ Converts the rasters within this layer to PNGs which are then converted to bytes. This is returned as a RDD[(K, bytes)].
Parameters: color_map ( ColorMap
) – AColorMap
instance used to color the PNGs.Returns: RDD[(K, bytes)]
-
to_spatial_layer
(target_time=None)¶ Converts a
RasterLayer
with alayout_type
ofLayoutType.SPACETIME
to aRasterLayer
with alayout_type
ofLayoutType.SPATIAL
.Parameters: target_time ( datetime.datetime
, optional) – The instance of interest. If set, the resultingRasterLayer
will only contain keys that contained the given instance. IfNone
, then all values within the layer will be kept.Returns: RasterLayer
Raises: ValueError
– If the layer already has alayout_type
ofLayoutType.SPATIAL
.
-
unpersist
()¶ Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
-
wrapped_rdds
()¶ Returns the list of RDD-containing objects wrapped by this object. The default implementation assumes that subclass contains a single RDD container, srdd, which implements the persist() and unpersist() methods.
- layer_type (str or
-
class
geopyspark.
TiledRasterLayer
(layer_type, srdd)¶ Wraps a RDD of tiled, GeoTrellis rasters.
Represents a RDD that contains
(K, V)
. WhereK
is eitherSpatialKey
orSpaceTimeKey
depending on thelayer_type
of the RDD, andV
being aTile
.The data held within the layer is tiled. This means that the rasters have been modified to fit a larger layout. For more information, see tiled-raster-rdd.
Parameters: - layer_type (str or
LayerType
) – What the layer type of the geotiffs are. This is represented by either constants withinLayerType
or by a string. - srdd (py4j.java_gateway.JavaObject) – The coresponding Scala class. This is what allows
TiledRasterLayer
to access the various Scala methods.
-
pysc
¶ pyspark.SparkContext – The
SparkContext
being used this session.
-
srdd
¶ py4j.java_gateway.JavaObject – The coresponding Scala class. This is what allows
RasterLayer
to access the various Scala methods.
-
is_floating_point_layer
¶ bool – Whether the data within the
TiledRasterLayer
is floating point or not.
-
zoom_level
¶ int – The zoom level of the layer. Can be
None
.
-
bands
(band)¶ Select a subsection of bands from the
Tile
s within the layer.Note
There could be potential high performance cost if operations are performed between two sub-bands of a large data set.
Note
Due to the natue of GeoPySpark’s backend, if selecting a band that is out of bounds then the error returned will be a
py4j.protocol.Py4JJavaError
and not a normal Python error.Parameters: band (int or tuple or list or range) – The band(s) to be selected from the Tile
s. Can either be a single int, or a collection of ints.Returns: TiledRasterLayer
with the selected bands.
-
cache
()¶ Persist this RDD with the default storage level (C{MEMORY_ONLY}).
-
collect_keys
()¶ Returns a list of all of the keys in the layer.
Note
This method should only be called on layers with a smaller number of keys, as a large number could cause memory issues.
Returns: [:class:`~geopyspark.geotrellis.ProjectedExtent`]
or[:class:`~geopyspark.geotrellis.TemporalProjectedExtent`]
-
convert_data_type
(new_type, no_data_value=None)¶ Converts the underlying, raster values to a new
CellType
.Parameters: - new_type (str or
CellType
) – The data type the cells should be to converted to. - no_data_value (int or float, optional) – The value that should be marked as NoData.
Returns: Raises: ValueError
– Ifno_data_value
is set and thenew_type
contains raw values.ValueError
– Ifno_data_value
is set andnew_type
is a boolean.
- new_type (str or
-
count
()¶ Returns how many elements are within the wrapped RDD.
Returns: The number of elements in the RDD. Return type: Int
-
focal
(operation, neighborhood=None, param_1=None, param_2=None, param_3=None)¶ Performs the given focal operation on the layers contained in the Layer.
Parameters: - operation (str or
Operation
) – The focal operation to be performed. - neighborhood (str or
Neighborhood
, optional) – The type of neighborhood to use in the focal operation. This can be represented by either an instance ofNeighborhood
, or by a constant. - param_1 (int or float, optional) – If using
Operation.SLOPE
, then this is the zFactor, else it is the first argument ofneighborhood
. - param_2 (int or float, optional) – The second argument of the
neighborhood
. - param_3 (int or float, optional) – The third argument of the
neighborhood
.
Note
param
only need to be set ifneighborhood
is not an instance ofNeighborhood
or ifneighborhood
isNone
.Any
param
that is not set will default to 0.0.If
neighborhood
isNone
thenoperation
must be eitherOperation.SLOPE
orOperation.ASPECT
.Returns: Raises: ValueError
– Ifoperation
is not a known operation.ValueError
– Ifneighborhood
is not a known neighborhood.ValueError
– Ifneighborhood
was not set, andoperation
is notOperation.SLOPE
orOperation.ASPECT
.
- operation (str or
-
classmethod
from_numpy_rdd
(layer_type, numpy_rdd, metadata, zoom_level=None)¶ Create a
TiledRasterLayer
from a numpy RDD.Parameters: - layer_type (str or
LayerType
) – What the layer type of the geotiffs are. This is represented by either constants withinLayerType
or by a string. - numpy_rdd (pyspark.RDD) – A PySpark RDD that contains tuples of either
SpatialKey
orSpaceTimeKey
and rasters that are represented by a numpy array. - metadata (
Metadata
) – TheMetadata
of theTiledRasterLayer
instance. - zoom_level (int, optional) – The
zoom_level
the resulting TiledRasterLayer should have. IfNone
, then the returned layer’szoom_level
will beNone
.
Returns: - layer_type (str or
-
getNumPartitions
()¶ Returns the number of partitions set for the wrapped RDD.
Returns: The number of partitions. Return type: Int
-
get_class_histogram
()¶ Creates a
Histogram
of integer values. Suitable for classification rasters with limited number values. If only single band is present histogram is returned directly.Returns: Histogram
or [Histogram
]
-
get_histogram
()¶ Creates a
Histogram
for each band in the layer. If only single band is present histogram is returned directly.Returns: Histogram
or [Histogram
]
-
get_min_max
()¶ Returns the maximum and minimum values of all of the rasters in the layer.
Returns: (float, float)
-
get_quantile_breaks
(num_breaks)¶ Returns quantile breaks for this Layer.
Parameters: num_breaks (int) – The number of breaks to return. Returns: [float]
-
get_quantile_breaks_exact_int
(num_breaks)¶ Returns quantile breaks for this Layer. This version uses the
FastMapHistogram
, which counts exact integer values. If your layer has too many values, this can cause memory errors.Parameters: num_breaks (int) – The number of breaks to return. Returns: [int]
-
histogram_series
(geometries)¶
-
layer_type
-
lookup
(col, row)¶ Return the value(s) in the image of a particular
SpatialKey
(given by col and row).Parameters: - col (int) – The
SpatialKey
column. - row (int) – The
SpatialKey
row.
Returns: [
Tile
]Raises: ValueError
– If using lookup on a nonLayerType.SPATIAL
TiledRasterLayer
.IndexError
– If col and row are not within theTiledRasterLayer
‘s bounds.
- col (int) – The
-
map_cells
(func)¶ Maps over the cells of each
Tile
within the layer with a given function.Note
This operation first needs to deserialize the wrapped
RDD
into Python and then serialize theRDD
back into aTiledRasterRDD
once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.Parameters: func (cells, nd => cells) – A function that takes two arguements: cells
andnd
. Wherecells
is the numpy array andnd
is theno_data_value
of the tile. It returnscells
which are the new cells values of the tile represented as a numpy array.Returns: TiledRasterLayer
-
map_tiles
(func)¶ Maps over each
Tile
within the layer with a given function.Note
This operation first needs to deserialize the wrapped
RDD
into Python and then serialize theRDD
back into aTiledRasterRDD
once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.Parameters: func ( Tile
=>Tile
) – A function that takes aTile
and returns aTile
.Returns: TiledRasterLayer
-
mask
(geometries)¶ Masks the
TiledRasterLayer
so that only values that intersect the geometries will be available.Parameters: geometries (shapely.geometry or [shapely.geometry]) – Either a list of, or a single shapely geometry/ies to use for the mask/s.
Note
All geometries must be in the same CRS as the TileLayer.
Returns: TiledRasterLayer
-
max_series
(geometries)¶
-
mean_series
(geometries)¶
-
min_series
(geometries)¶
-
normalize
(new_min, new_max, old_min=None, old_max=None)¶ Finds the min value that is contained within the given geometry.
Note
If
old_max - old_min <= 0
ornew_max - new_min <= 0
, then the normalization will fail.Parameters: - old_min (int or float, optional) – Old minimum. If not given, then the minimum value of this layer will be used.
- old_max (int or float, optional) – Old maximum. If not given, then the minimum value of this layer will be used.
- new_min (int or float) – New minimum to normalize to.
- new_max (int or float) – New maximum to normalize to.
Returns:
-
persist
(storageLevel=StorageLevel(False, True, False, False, 1))¶ Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).
-
polygonal_max
(geometry, data_type)¶ Finds the max value that is contained within the given geometry.
Parameters: - geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
Polygon
orMultiPolygon
that represents the area where the summary should be computed; or a WKB representation of the geometry. - data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns: int or float depending on
data_type
.Raises: TypeError
– Ifdata_type
is not an int or float.- geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
-
polygonal_mean
(geometry)¶ Finds the mean of all of the values that are contained within the given geometry.
Parameters: geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A Shapely Polygon
orMultiPolygon
that represents the area where the summary should be computed; or a WKB representation of the geometry.Returns: float
-
polygonal_min
(geometry, data_type)¶ Finds the min value that is contained within the given geometry.
Parameters: - geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
Polygon
orMultiPolygon
that represents the area where the summary should be computed; or a WKB representation of the geometry. - data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns: int or float depending on
data_type
.Raises: TypeError
– Ifdata_type
is not an int or float.- geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
-
polygonal_sum
(geometry, data_type)¶ Finds the sum of all of the values that are contained within the given geometry.
Parameters: - geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
Polygon
orMultiPolygon
that represents the area where the summary should be computed; or a WKB representation of the geometry. - data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns: int or float depending on
data_type
.Raises: TypeError
– Ifdata_type
is not an int or float.- geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
-
pyramid
(resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)¶ Creates a layer
Pyramid
where the resolution is halved per level.Parameters: resample_method (str or ResampleMethod
, optional) – The resample method to use when building the pyramid. Default isResampleMethods.NEAREST_NEIGHBOR
.Returns: Pyramid
.Raises: ValueError
– If this layer layout is not ofGlobalLayout
type.
-
pysc
-
reclassify
(value_map, data_type, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>, replace_nodata_with=None)¶ Changes the cell values of a raster based on how the data is broken up.
Parameters: - value_map (dict) – A
dict
whose keys represent values where a break should occur and its values are the new value the cells within the break should become. - data_type (type) – The type of the values within the rasters. Can either be int or float.
- classification_strategy (str or
ClassificationStrategy
, optional) – How the cells should be classified along the breaks. If unspecified, thenClassificationStrategy.LESS_THAN_OR_EQUAL_TO
will be used. - replace_nodata_with (data_type, optional) – When remapping values, nodata values must be treated separately. If nodata values are intended to be replaced during the reclassify, this variable should be set to the intended value. If unspecified, nodata values will be preserved.
Note
NoData symbolizes a different value depending on if
data_type
is int or float. For int, the constantNO_DATA_INT
can be used which represents the NoData value for int in GeoTrellis. For float,float('nan')
is used to represent NoData.Returns: TiledRasterLayer
- value_map (dict) – A
-
repartition
(num_partitions=None)¶ Repartition underlying RDD using HashPartitioner. If
num_partitions
is None, existing number of partitions will be used.Parameters: num_partitions (int, optional) – Desired number of partitions Returns: TiledRasterLayer
-
reproject
(target_crs, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)¶ Reproject rasters to
target_crs
. The reproject does not sample past tile boundary.Parameters: - target_crs (str or int) – Target CRS of reprojection. Either EPSG code, well-known name, or a PROJ.4 string.
- resample_method (str or
ResampleMethod
, optional) – The resample method to use for the reprojection. If none is specified, thenResampleMethods.NEAREST_NEIGHBOR
is used.
Returns:
-
save_stitched
(path, crop_bounds=None, crop_dimensions=None)¶ Stitch all of the rasters within the Layer into one raster and then saves it to a given path.
Parameters: - path (str) – The path of the geotiff to save. The path must be on the local file system.
- crop_bounds (
Extent
, optional) – The subExtent
with which to crop the raster before saving. IfNone
, then the whole raster will be saved. - crop_dimensions (tuple(int) or list(int), optional) – cols and rows of the image to save
represented as either a tuple or list. If
None
then all cols and rows of the raster will be save.
Note
This can only be used on
LayerType.SPATIAL
TiledRasterLayer
s.Note
If
crop_dimensions
is set thencrop_bounds
must also be set.
-
srdd
-
star_series
(geometries, fn)¶
-
stitch
()¶ Stitch all of the rasters within the Layer into one raster.
Note
This can only be used on
LayerType.SPATIAL
TiledRasterLayer
s.Returns: Tile
-
sum_series
(geometries)¶
-
tile_to_layout
(layout, target_crs=None, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)¶ Cut tiles to a given layout and merge overlapping tiles. This will produce unique keys.
- :param layout (
LayoutDefinition
or:Metadata
or TiledRasterLayer
orGlobalLayout
orLocalLayout
):Target raster layout for the tiling operation.
Parameters: - target_crs (str or int, optional) – Target CRS of reprojection. Either EPSG code,
well-known name, or a PROJ.4 string. If
None
, no reproject will be perfomed. - resample_method (str or
ResampleMethod
, optional) – The resample method to use for the reprojection. If none is specified, thenResampleMethods.NEAREST_NEIGHBOR
is used.
Returns: - :param layout (
-
to_geotiff_rdd
(storage_method=<StorageMethod.STRIPED: 'Striped'>, rows_per_strip=None, tile_dimensions=(256, 256), compression=<Compression.NO_COMPRESSION: 'NoCompression'>, color_space=<ColorSpace.BLACK_IS_ZERO: 1>, color_map=None, head_tags=None, band_tags=None)¶ Converts the rasters within this layer to GeoTiffs which are then converted to bytes. This is returned as a
RDD[(K, bytes)]
. WhereK
is eitherSpatialKey
orSpaceTimeKey
.Parameters: - storage_method (str or
StorageMethod
, optional) – How the segments within the GeoTiffs should be arranged. Default isStorageMethod.STRIPED
. - rows_per_strip (int, optional) – How many rows should be in each strip segment of the
GeoTiffs if
storage_method
isStorageMethod.STRIPED
. IfNone
, then the strip size will default to a value that is 8K or less. - tile_dimensions ((int, int), optional) – The length and width for each tile segment of the GeoTiff
if
storage_method
isStorageMethod.TILED
. IfNone
then the default size is(256, 256)
. - compression (str or
Compression
, optional) – How the data should be compressed. Defaults toCompression.NO_COMPRESSION
. - color_space (str or
ColorSpace
, optional) – How the colors should be organized in the GeoTiffs. Defaults toColorSpace.BLACK_IS_ZERO
. - color_map (
ColorMap
, optional) – AColorMap
instance used to color the GeoTiffs to a different gradient. - head_tags (dict, optional) – A
dict
where each key and value is astr
. - band_tags (list, optional) – A
list
ofdict
s where each key and value is astr
. - Note – For more information on the contents of the tags, see www.gdal.org/gdal_datamodel.html
Returns: RDD[(K, bytes)]
- storage_method (str or
-
to_numpy_rdd
()¶ Converts a
TiledRasterLayer
to a numpy RDD.Note
Depending on the size of the data stored within the RDD, this can be an exspensive operation and should be used with caution.
Returns: RDD
-
to_png_rdd
(color_map)¶ Converts the rasters within this layer to PNGs which are then converted to bytes. This is returned as a RDD[(K, bytes)].
Parameters: color_map ( ColorMap
) – AColorMap
instance used to color the PNGs.Returns: RDD[(K, bytes)]
-
to_spatial_layer
(target_time=None)¶ Converts a
TiledRasterLayer
with alayout_type
ofLayoutType.SPACETIME
to aTiledRasterLayer
with alayout_type
ofLayoutType.SPATIAL
.Parameters: target_time ( datetime.datetime
, optional) – The instance of interest. If set, the resultingTiledRasterLayer
will only contain keys that contained the given instance. IfNone
, then all values within the layer will be kept.Returns: TiledRasterLayer
Raises: ValueError
– If the layer already has alayout_type
ofLayoutType.SPATIAL
.
-
unpersist
()¶ Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
-
wrapped_rdds
()¶ Returns the list of RDD-containing objects wrapped by this object. The default implementation assumes that subclass contains a single RDD container, srdd, which implements the persist() and unpersist() methods.
- layer_type (str or
-
class
geopyspark.
Pyramid
(levels)¶ Contains a list of
TiledRasterLayer
s that make up a tile pyramid. Each layer represents a level within the pyramid. This class is used when creating a tile server.Map algebra can performed on instances of this class.
Parameters: levels (list or dict) – A list of TiledRasterLayer
s or a dict ofTiledRasterLayer
s where the value is the layer itself and the key is its given zoom level.-
pysc
¶ pyspark.SparkContext – The
SparkContext
being used this session.
-
layer_type (class
~geopyspark.geotrellis.constants.LayerType): What the layer type of the geotiffs are.
-
levels
¶ dict – A dict of
TiledRasterLayer
s where the value is the layer itself and the key is its given zoom level.
-
max_zoom
¶ int – The highest zoom level of the pyramid.
-
is_cached
¶ bool – Signals whether or not the internal RDDs are cached. Default is
False
.
-
histogram
¶ Histogram
– TheHistogram
that represents the layer with the max zoomw. Will not be calculated unless theget_histogram()
method is used. Otherwise, its value isNone
.
Raises: TypeError
– Iflevels
is neither a list or dict.-
cache
()¶ Persist this RDD with the default storage level (C{MEMORY_ONLY}).
-
count
()¶ Returns how many elements are within the wrapped RDD.
Returns: The number of elements in the RDD. Return type: Int
-
getNumPartitions
()¶ Returns the number of partitions set for the wrapped RDD.
Returns: The number of partitions. Return type: Int
-
histogram
-
is_cached
-
layer_type
¶
-
levels
-
max_zoom
-
persist
(storageLevel=StorageLevel(False, True, False, False, 1))¶ Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).
-
pysc
-
unpersist
()¶ Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
-
wrapped_rdds
()¶ Returns a list of the wrapped, Scala RDDs within each layer of the pyramid.
Returns: [org.apache.spark.rdd.RDD]
-
-
class
geopyspark.
Square
(extent)¶
-
class
geopyspark.
Circle
(radius)¶ A circle neighborhood.
Parameters: radius (int or float) – The radius of the circle that determines which cells fall within the bounding box. -
radius
¶ int or float – The radius of the circle that determines which cells fall within the bounding box.
-
param_1
¶ float – Same as
radius
.
-
param_2
¶ float – Unused param for
Circle
. Is 0.0.
-
param_3
¶ float – Unused param for
Circle
. Is 0.0.
-
name
¶ str – The name of the neighborhood which is, “circle”.
Note
Cells that lie exactly on the radius of the circle are apart of the neighborhood.
-
-
class
geopyspark.
Wedge
(radius, start_angle, end_angle)¶ A wedge neighborhood.
Parameters: - radius (int or float) – The radius of the wedge.
- start_angle (int or float) – The starting angle of the wedge in degrees.
- end_angle (int or float) – The ending angle of the wedge in degrees.
-
radius
¶ int or float – The radius of the wedge.
-
start_angle
¶ int or float – The starting angle of the wedge in degrees.
-
end_angle
¶ int or float – The ending angle of the wedge in degrees.
-
param_1
¶ float – Same as
radius
.
-
param_2
¶ float – Same as
start_angle
.
-
param_3
¶ float – Same as
end_angle
.
-
name
¶ str – The name of the neighborhood which is, “wedge”.
-
class
geopyspark.
Nesw
(extent)¶ A neighborhood that includes a column and row intersection for the focus.
Parameters: extent (int or float) – The extent of this neighborhood. This represents the how many cells past the focus the bounding box goes. -
extent
¶ int or float – The extent of this neighborhood. This represents the how many cells past the focus the bounding box goes.
-
param_1
¶ float – Same as
extent
.
-
param_2
¶ float – Unused param for
Nesw
. Is 0.0.
-
param_3
¶ float – Unused param for
Nesw
. Is 0.0.
-
name
¶ str – The name of the neighborhood which is, “nesw”.
-
-
class
geopyspark.
Annulus
(inner_radius, outer_radius)¶ An Annulus neighborhood.
Parameters: - inner_radius (int or float) – The radius of the inner circle.
- outer_radius (int or float) – The radius of the outer circle.
-
inner_radius
¶ int or float – The radius of the inner circle.
-
outer_radius
¶ int or float – The radius of the outer circle.
-
param_1
¶ float – Same as
inner_radius
.
-
param_2
¶ float – Same as
outer_radius
.
-
param_3
¶ float – Unused param for
Annulus
. Is 0.0.
-
name
¶ str – The name of the neighborhood which is, “annulus”.
-
geopyspark.
rasterize
(geoms, crs, zoom, fill_value, cell_type=<CellType.FLOAT64: 'float64'>, options=None, num_partitions=None)¶ Rasterizes a Shapely geometries.
Parameters: - geoms ([shapely.geometry]) – List of shapely geometries to rasterize.
- crs (str or int) – The CRS of the input geometry.
- zoom (int) – The zoom level of the output raster.
- fill_value (int or float) – Value to burn into pixels intersectiong geometry
- cell_type (str or
CellType
) – Which data type the cells should be when created. Defaults toCellType.FLOAT64
. - options (
RasterizerOptions
, optional) – Pixel intersection options. - num_partitions (int, optional) – The number of repartitions Spark will make when the data is
repartitioned. If
None
, then the data will not be repartitioned.
Returns:
-
class
geopyspark.
TileRender
(render_function)¶ A Python implementation of the Scala geopyspark.geotrellis.tms.TileRender interface. Permits a callback from Scala to Python to allow for custom rendering functions.
Parameters: render_function (Tile => PIL.Image.Image) – A function to convert geopyspark.geotrellis.Tile to a PIL Image. -
render_function
¶ Tile => PIL.Image.Image – A function to convert geopyspark.geotrellis.Tile to a PIL Image.
-
TileRender.
renderEncoded
(scala_array)¶ A function to convert an array to an image.
Parameters: scala_array – A linear array of bytes representing the protobuf-encoded contents of a tile Returns: bytes representing an image
-
TileRender.
requiresEncoding
()¶
-
-
class
geopyspark.
TMS
(server)¶ Provides a TMS server for raster data.
In order to display raster data on a variety of different map interfaces (e.g., leaflet maps, geojson.io, GeoNotebook, and others), we provide the TMS class.
Parameters: server (JavaObject) – The Java TMSServer instance -
pysc
¶ pyspark.SparkContext – The
SparkContext
being used this session.
-
server
¶ JavaObject – The Java TMSServer instance
-
host
¶ str – The IP address of the host, if bound, else None
-
port
¶ int – The port number of the TMS server, if bound, else None
-
url_pattern
¶ string – The URI pattern for the current TMS service, with {z}, {x}, {y} tokens. Can be copied directly to services such as geojson.io.
-
bind
(host=None, requested_port=None)¶ Starts up a TMS server.
Parameters: - host (str, optional) – The target host. Typically “localhost”, “127.0.0.1”, or “0.0.0.0”. The latter will make the TMS service accessible from the world. If omitted, defaults to localhost.
- requested_port (optional, int) – A port number to bind the service to. If omitted, use a random available port.
-
classmethod
build
(source, display, allow_overzooming=True)¶ Builds a TMS server from one or more layers.
This function takes a SparkContext, a source or list of sources, and a display method and creates a TMS server to display the desired content. The display method is supplied as a ColorMap (only available when there is a single source), or a callable object which takes either a single tile input (when there is a single source) or a list of tiles (for multiple sources) and returns the bytes representing an image file for that tile.
Parameters: - source (tuple or orlist or
Pyramid
) – The tile sources to render. Tuple inputs are (str, str) pairs where the first component is the URI of a catalog and the second is the layer name. A list input may be any combination of tuples andPyramid
s. - display (ColorMap, callable) – Method for mapping tiles to images. ColorMap may only be applied to single input source. Callable will take a single numpy array for a single source, or a list of numpy arrays for multiple sources. In the case of multiple inputs, resampling may be required if the tile sources have different tile sizes. Returns bytes representing the resulting image.
- allow_overzooming (bool) – If set, viewing at zoom levels above the highest available zoom level will produce tiles that are resampled from the highest zoom level present in the data set.
- source (tuple or orlist or
-
host
Returns the IP string of the server’s host if bound, else None.
Returns: (str)
-
port
Returns the port number for the current TMS server if bound, else None.
Returns: (int)
-
set_handshake
(handshake)¶
-
unbind
()¶ Shuts down the TMS service, freeing the assigned port.
-
url_pattern
Returns the URI for the tiles served by the present server. Contains {z}, {x}, and {y} tokens to be substituted for the desired zoom and x/y tile position.
Returns: (str)
-
geopyspark.geotrellis package¶
This subpackage contains the code that reads, writes, and processes data using GeoTrellis.
-
class
geopyspark.geotrellis.
Tile
¶ Represents a raster in GeoPySpark.
Note
All rasters in GeoPySpark are represented as having multiple bands, even if the original raster just contained one.
Parameters: - cells (nd.array) – The raster data itself. It is contained within a NumPy array.
- data_type (str) – The data type of the values within
data
if they were in Scala. - no_data_value – The value that represents no data value in the raster. This can be represented by a variety of types depending on the value type of the raster.
-
cells
¶ nd.array – The raster data itself. It is contained within a NumPy array.
-
data_type
¶ str – The data type of the values within
data
if they were in Scala.
-
no_data_value
¶ The value that represents no data value in the raster. This can be represented by a variety of types depending on the value type of the raster.
-
cell_type
¶ Alias for field number 1
-
cells
Alias for field number 0
-
count
(value) → integer -- return number of occurrences of value¶
-
static
dtype_to_cell_type
(dtype)¶ Converts a
np.dtype
to the corresponding GeoPySparkcell_type
.Note
bool
,complex64
,complex128
, andcomplex256
, are currently not supportednp.dtype
s.Parameters: dtype (np.dtype) – The dtype
of the numpy array.Returns: str. The GeoPySpark cell_type
equivalent of thedtype
.Raises: TypeError
– If the givendtype
is not a supported data type.
-
classmethod
from_numpy_array
(numpy_array, no_data_value=None)¶ Creates an instance of
Tile
from a numpy array.Parameters: - numpy_array (np.array) –
The numpy array to be used to represent the cell values of the
Tile
.Note
GeoPySpark does not support arrays with the following data types:
bool
,complex64
,complex128
, andcomplex256
. - no_data_value (optional) – The value that represents no data value in the raster.
This can be represented by a variety of types depending on the value type of
the raster. If not given, then the value will be
None
.
Returns: - numpy_array (np.array) –
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
no_data_value
Alias for field number 2
-
class
geopyspark.geotrellis.
Extent
¶ The “bounding box” or geographic region of an area on Earth a raster represents.
Parameters: - xmin (float) – The minimum x coordinate.
- ymin (float) – The minimum y coordinate.
- xmax (float) – The maximum x coordinate.
- ymax (float) – The maximum y coordinate.
-
xmin
¶ float – The minimum x coordinate.
-
ymin
¶ float – The minimum y coordinate.
-
xmax
¶ float – The maximum x coordinate.
-
ymax
¶ float – The maximum y coordinate.
-
count
(value) → integer -- return number of occurrences of value¶
-
classmethod
from_polygon
(polygon)¶ Creates a new instance of
Extent
from a Shapely Polygon.The new
Extent
will contain the min and max coordinates of the Polygon; regardless of the Polygon’s shape.Parameters: polygon (shapely.geometry.Polygon) – A Shapely Polygon. Returns: Extent
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
to_polygon
¶ Converts this instance to a Shapely Polygon.
The resulting Polygon will be in the shape of a box.
Returns: shapely.geometry.Polygon
-
xmax
Alias for field number 2
-
xmin
Alias for field number 0
-
ymax
Alias for field number 3
-
ymin
Alias for field number 1
-
class
geopyspark.geotrellis.
ProjectedExtent
¶ Describes both the area on Earth a raster represents in addition to its CRS.
Parameters: - extent (
Extent
) – The area the raster represents. - epsg (int, optional) – The EPSG code of the CRS.
- proj4 (str, optional) – The Proj.4 string representation of the CRS.
-
epsg
¶ int, optional – The EPSG code of the CRS.
-
proj4
¶ str, optional – The Proj.4 string representation of the CRS.
Note
Either
epsg
orproj4
must be defined.-
count
(value) → integer -- return number of occurrences of value¶
-
epsg
Alias for field number 1
-
extent
Alias for field number 0
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
proj4
Alias for field number 2
- extent (
-
class
geopyspark.geotrellis.
TemporalProjectedExtent
¶ Describes the area on Earth the raster represents, its CRS, and the time the data was collected.
Parameters: - extent (
Extent
) – The area the raster represents. - instant (
datetime.datetime
) – The time stamp of the raster. - epsg (int, optional) – The EPSG code of the CRS.
- proj4 (str, optional) – The Proj.4 string representation of the CRS.
-
instant
¶ datetime.datetime
– The time stamp of the raster.
-
epsg
¶ int, optional – The EPSG code of the CRS.
-
proj4
¶ str, optional – The Proj.4 string representation of the CRS.
Note
Either
epsg
orproj4
must be defined.-
count
(value) → integer -- return number of occurrences of value¶
-
epsg
Alias for field number 2
-
extent
Alias for field number 0
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
instant
Alias for field number 1
-
proj4
Alias for field number 3
- extent (
-
class
geopyspark.geotrellis.
SpatialKey
(col, row)¶ Represents the position of a raster within a grid. This grid is a 2D plane where raster positions are represented by a pair of coordinates.
Parameters: - col (int) – The column of the grid, the numbers run east to west.
- row (int) – The row of the grid, the numbers run north to south.
Returns: -
col
¶ Alias for field number 0
-
count
(value) → integer -- return number of occurrences of value¶
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
row
¶ Alias for field number 1
-
class
geopyspark.geotrellis.
SpaceTimeKey
(col, row, instant)¶ Represents the position of a raster within a grid. This grid is a 3D plane where raster positions are represented by a pair of coordinates as well as a z value that represents time.
Parameters: - col (int) – The column of the grid, the numbers run east to west.
- row (int) – The row of the grid, the numbers run north to south.
- instant (
datetime.datetime
) – The time stamp of the raster.
Returns: -
col
¶ Alias for field number 0
-
count
(value) → integer -- return number of occurrences of value¶
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
instant
¶ Alias for field number 2
-
row
¶ Alias for field number 1
-
class
geopyspark.geotrellis.
Metadata
(bounds, crs, cell_type, extent, layout_definition)¶ Information of the values within a
RasterLayer
orTiledRasterLayer
. This data pertains to the layout and other attributes of the data within the classes.Parameters: - bounds (
Bounds
) – TheBounds
of the values in the class. - crs (str or int) – The
CRS
of the data. Can either be the EPSG code, well-known name, or a PROJ.4 projection string. - cell_type (str or
CellType
) – The data type of the cells of the rasters. - extent (
Extent
) – TheExtent
that covers the all of the rasters. - layout_definition (
LayoutDefinition
) – TheLayoutDefinition
of all rasters.
-
crs
¶ str or int – The CRS of the data. Can either be the EPSG code, well-known name, or a PROJ.4 projection string.
-
cell_type
¶ str – The data type of the cells of the rasters.
-
no_data_value
¶ int or float or None – The noData value of the rasters within the layer. This can either be
None
, anint
, or afloat
depending on thecell_type
.
-
tile_layout
¶ TileLayout
– TheTileLayout
that describes how the rasters are orginized.
-
layout_definition
¶ LayoutDefinition
– TheLayoutDefinition
of all rasters.
-
classmethod
from_dict
(metadata_dict)¶ Creates
Metadata
from a dictionary.Parameters: metadata_dict (dict) – The Metadata
of aRasterLayer
orTiledRasterLayer
instance that is indict
form.Returns: Metadata
-
to_dict
()¶ Converts this instance to a
dict
.Returns: dict
- bounds (
-
class
geopyspark.geotrellis.
TileLayout
(layoutCols, layoutRows, tileCols, tileRows)¶ Describes the grid in which the rasters within a Layer should be laid out.
Parameters: - layoutCols (int) – The number of columns of rasters that runs east to west.
- layoutRows (int) – The number of rows of rasters that runs north to south.
- tileCols (int) – The number of columns of pixels in each raster that runs east to west.
- tileRows (int) – The number of rows of pixels in each raster that runs north to south.
Returns: -
count
(value) → integer -- return number of occurrences of value¶
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
layoutCols
¶ Alias for field number 0
-
layoutRows
¶ Alias for field number 1
-
tileCols
¶ Alias for field number 2
-
tileRows
¶ Alias for field number 3
-
class
geopyspark.geotrellis.
GlobalLayout
(tile_size, zoom, threshold)¶ TileLayout type that spans global CRS extent.
When passed in place of LayoutDefinition it signifies that a LayoutDefinition instance should be constructed such that it fits the global CRS extent. The cell resolution of resulting layout will be one of resolutions implied by power of 2 pyramid for that CRS. Tiling to this layout will likely result in either up-sampling or down-sampling the source raster.
Parameters: - tile_size (int) – The number of columns and row pixels in each tile.
- zoom (int, optional) – Override the zoom level in power of 2 pyramid.
- threshold (float, optional) – The percentage difference between a cell size and a zoom level and the resolution difference between that zoom level and the next that is tolerated to snap to the lower-resolution zoom level. For example, if this paramter is 0.1, that means we’re willing to downsample rasters with a higher resolution in order to fit them to some zoom level Z, if the difference is resolution is less than or equal to 10% the difference between the resolutions of zoom level Z and zoom level Z+1.
Returns: -
count
(value) → integer -- return number of occurrences of value¶
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
threshold
¶ Alias for field number 2
-
tile_size
¶ Alias for field number 0
-
zoom
¶ Alias for field number 1
-
class
geopyspark.geotrellis.
LocalLayout
¶ TileLayout type that snaps the layer extent.
When passed in place of LayoutDefinition it signifies that a LayoutDefinition instances should be constructed over the envelope of the layer pixels with given tile size. Resulting TileLayout will match the cell resolution of the source rasters.
Parameters: - tile_size (int, optional) – The number of columns and row pixels in each tile. If this
is
None
, then the sizes of each tile will be set usingtile_cols
andtile_rows
. - tile_cols (int, optional) – The number of column pixels in each tile. This supersedes
tile_size
. Meaning if this andtile_size
are set, then this will be used for the number of colunn pixles. IfNone
, then the number of column pixels will default to 256. - tile_rows (int, optional) – The number of rows pixels in each tile. This supersedes
tile_size
. Meaning if this andtile_size
are set, then this will be used for the number of row pixles. IfNone
, then the number of row pixels will default to 256.
-
tile_cols
¶ int – The number of column pixels in each tile
-
tile_rows
¶ int – The number of rows pixels in each tile. This supersedes
-
count
(value) → integer -- return number of occurrences of value¶
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
tile_cols
Alias for field number 0
-
tile_rows
Alias for field number 1
- tile_size (int, optional) – The number of columns and row pixels in each tile. If this
is
-
class
geopyspark.geotrellis.
LayoutDefinition
(extent, tileLayout)¶ Describes the layout of the rasters within a Layer and how they are projected.
Parameters: - extent (
Extent
) – TheExtent
of the layout. - tileLayout (
TileLayout
) – TheTileLayout
of how the rasters within the Layer.
Returns: -
count
(value) → integer -- return number of occurrences of value¶
-
extent
¶ Alias for field number 0
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
tileLayout
¶ Alias for field number 1
- extent (
-
class
geopyspark.geotrellis.
Bounds
¶ Represents the grid that covers the area of the rasters in a Layer on a grid.
Parameters: - minKey (
SpatialKey
orSpaceTimeKey
) – The smallestSpatialKey
orSpaceTimeKey
. - minKey – The largest
SpatialKey
orSpaceTimeKey
.
Returns: -
count
(value) → integer -- return number of occurrences of value¶
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
maxKey
¶ Alias for field number 1
-
minKey
¶ Alias for field number 0
- minKey (
-
geopyspark.geotrellis.
RasterizerOptions
¶ Represents options available to geometry rasterizer
Parameters: - includePartial (bool) – Include partial pixel intersection (default: True)
- sampleType (str) – ‘PixelIsArea’ or ‘PixelIsPoint’ (default: ‘PixelIsPoint’)
alias of
RasterizeOption
-
geopyspark.geotrellis.
read_layer_metadata
(uri, layer_name, layer_zoom)¶ Reads the metadata from a saved layer without reading in the whole layer.
Parameters: - uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
- layer_name (str) – The name of the GeoTrellis catalog to be read from.
- layer_zoom (int) – The zoom level of the layer that is to be read.
Returns:
-
geopyspark.geotrellis.
read_value
(uri, layer_name, layer_zoom, col, row, zdt=None, store=None)¶ Reads a single
Tile
from a GeoTrellis catalog. Unlike other functions in this module, this will not return aTiledRasterLayer
, but rather a GeoPySpark formatted raster.Note
When requesting a tile that does not exist,
None
will be returned.Parameters: - uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
- layer_name (str) – The name of the GeoTrellis catalog to be read from.
- layer_zoom (int) – The zoom level of the layer that is to be read.
- col (int) – The col number of the tile within the layout. Cols run east to west.
- row (int) – The row number of the tile within the layout. Row run north to south.
- zdt (
datetime.datetime
) – The time stamp of the tile if the data is spatial-temporal. This is represented as adatetime.datetime.
instance. The default value is,None
. IfNone
, then only the spatial area will be queried. - store (str or
AttributeStore
, optional) –AttributeStore
instance or URI for layer metadata lookup.
Returns:
-
geopyspark.geotrellis.
query
(uri, layer_name, layer_zoom=None, query_geom=None, time_intervals=None, query_proj=None, num_partitions=None, store=None)¶ Queries a single, zoom layer from a GeoTrellis catalog given spatial and/or time parameters.
Note
The whole layer could still be read in if
intersects
and/ortime_intervals
have not been set, or if the querried region contains the entire layer.Parameters: - layer_type (str or
LayerType
) – What the layer type of the geotiffs are. This is represented by either constants withinLayerType
or by a string. - uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
- layer_name (str) – The name of the GeoTrellis catalog to be querried.
- layer_zoom (int, optional) – The zoom level of the layer that is to be querried.
If
None
, then thelayer_zoom
will be set to 0. - query_geom (bytes or shapely.geometry or
Extent
, Optional) –The desired spatial area to be returned. Can either be a string, a shapely geometry, or instance of
Extent
, or a WKB verson of the geometry.Note
Not all shapely geometires are supported. The following is are the types that are supported: * Point * Polygon * MultiPolygon
Note
Only layers that were made from spatial, singleband GeoTiffs can query a
Point
. All other types are restricted toPolygon
andMulitPolygon
.If not specified, then the entire layer will be read.
- time_intervals (
[datetime.datetime]
, optional) – A list of the time intervals to query. This parameter is only used when querying spatial-temporal data. The default value is,None
. IfNone
, then only the spatial area will be querried. - query_proj (int or str, optional) – The crs of the querried geometry if it is different
than the layer it is being filtered against. If they are different and this is not set,
then the returned
TiledRasterLayer
could contain incorrect values. IfNone
, then the geometry and layer are assumed to be in the same projection. - num_partitions (int, optional) – Sets RDD partition count when reading from catalog.
- store (str or
AttributeStore
, optional) –AttributeStore
instance or URI for layer metadata lookup.
Returns: - layer_type (str or
-
geopyspark.geotrellis.
write
(uri, layer_name, tiled_raster_layer, index_strategy=<IndexingMethod.ZORDER: 'zorder'>, time_unit=None, store=None)¶ Writes a tile layer to a specified destination.
Parameters: - uri (str) – The Uniform Resource Identifier used to point towards the desired location for the tile layer to written to. The shape of this string varies depending on backend.
- layer_name (str) – The name of the new, tile layer.
- layer_zoom (int) – The zoom level the layer should be saved at.
- tiled_raster_layer (
TiledRasterLayer
) – TheTiledRasterLayer
to be saved. - index_strategy (str or
IndexingMethod
) – The method used to orginize the saved data. Depending on the type of data within the layer, only certain methods are available. Can either be a string or aIndexingMethod
attribute. The default method used is,IndexingMethod.ZORDER
. - time_unit (str or
TimeUnit
, optional) – Which time unit should be used when saving spatial-temporal data. This controls the resolution of each index. Meaning, what time intervals are used to seperate each record. While this is set toNone
as default, it must be set if saving spatial-temporal data. Depending on the indexing method chosen, different time units are used. - store (str or
AttributeStore
, optional) –AttributeStore
instance or URI for layer metadata lookup.
-
class
geopyspark.geotrellis.
AttributeStore
(uri)¶ AttributeStore provides a way to read and write GeoTrellis layer attributes.
Internally all attribute values are stored as JSON, here they are exposed as dictionaries. Classes often stored have a
.from_dict
and.to_dict
methods to bridge the gap:import geopyspark as gps store = gps.AttributeStore("s3://azavea-datahub/catalog") hist = store.layer("us-nlcd2011-30m-epsg3857", zoom=7).read("histogram") hist = gps.Histogram.from_dict(hist)
-
class
Attributes
(store, layer_name, layer_zoom)¶ Accessor class for all attributes for a given layer
-
delete
(name)¶ Delete attribute by name
Parameters: name (str) – Attribute name
-
layer_metadata
()¶
-
read
(name)¶ Read layer attribute by name as a dict
Parameters: name (str) – Returns: Attribute value Return type: dict
-
write
(name, value)¶ Write layer attribute value as a dict
Parameters: - name (str) – Attribute name
- value (dict) – Attribute value
-
-
classmethod
AttributeStore.
build
(store)¶ Builds AttributeStore from URI or passes an instance through.
Parameters: uri (str or AttributeStore) – URI for AttributeStore object or instance. Returns: AttributeStore
-
classmethod
AttributeStore.
cached
(uri)¶ Returns cached version of AttributeStore for URI or creates one
-
AttributeStore.
contains
(name, zoom=None)¶ Checks if this store contains a layer metadata.
Parameters: - name (str) – Layer name
- zoom (int, optional) – Layer zoom
Returns: bool
-
AttributeStore.
delete
(name, zoom=None)¶ Delete layer and all its attributes
Parameters: - name (str) – Layer name
- zoom (int, optional) – Layer zoom
-
AttributeStore.
layer
(name, zoom=None)¶ Layer Attributes object for given layer :param name: Layer name :type name: str :param zoom: Layer zoom :type zoom: int, optional
Returns: Attributes
-
AttributeStore.
layers
()¶ List all layers Attributes objects
Returns: [:class:`~geopyspark.geotrellis.catalog.AttributeStore.Attributes`]
-
class
-
geopyspark.geotrellis.
get_colors_from_colors
(colors)¶ Returns a list of integer colors from a list of Color objects from the colortools package.
Parameters: colors ([colortools.Color]) – A list of color stops using colortools.Color Returns: [int]
-
geopyspark.geotrellis.
get_colors_from_matplotlib
(ramp_name, num_colors=256)¶ Returns a list of color breaks from the color ramps defined by Matplotlib.
Parameters: - ramp_name (str) – The name of a matplotlib color ramp. See the matplotlib documentation for a list of names and details on each color ramp.
- num_colors (int, optional) – The number of color breaks to derive from the named map.
Returns: [int]
-
class
geopyspark.geotrellis.
ColorMap
(cmap)¶ A class that wraps a GeoTrellis ColorMap class.
Parameters: cmap (py4j.java_gateway.JavaObject) – The JavaObject
that represents the GeoTrellis ColorMap.-
cmap
¶ py4j.java_gateway.JavaObject – The
JavaObject
that represents the GeoTrellis ColorMap.
-
classmethod
build
(breaks, colors=None, no_data_color=0, fallback=0, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>)¶ Given breaks and colors, build a
ColorMap
object.Parameters: - breaks (dict or list or
Histogram
) – If adict
then a mapping from tile values to colors, the latter represented as integers e.g., 0xff000080 is red at half opacity. If alist
then tile values that specify breaks in the color mapping. If aHistogram
then a histogram from which breaks can be derived. - colors (str or list, optional) – If a
str
then the name of a matplotlib color ramp. If alist
then either a list of colortoolsColor
objects or a list of integers containing packed RGBA values. IfNone
, then theColorMap
will be created from thebreaks
given. - no_data_color (int, optional) – A color to replace NODATA values with
- fallback (int, optional) – A color to replace cells that have no value in the mapping
- classification_strategy (str or
ClassificationStrategy
, optional) – A string giving the strategy for converting tile values to colors. e.g., ifClassificationStrategy.LESS_THAN_OR_EQUAL_TO
is specified, and the break map is {3: 0xff0000ff, 4: 0x00ff00ff}, then values up to 3 map to red, values from above 3 and up to and including 4 become green, and values over 4 become the fallback color.
Returns: ColorMap
- breaks (dict or list or
-
classmethod
from_break_map
(break_map, no_data_color=0, fallback=0, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>)¶ Converts a dictionary mapping from tile values to colors to a ColorMap.
Parameters: - break_map (dict) – A mapping from tile values to colors, the latter represented as integers e.g., 0xff000080 is red at half opacity.
- no_data_color (int, optional) – A color to replace NODATA values with
- fallback (int, optional) – A color to replace cells that have no value in the mapping
- classification_strategy (str or
ClassificationStrategy
, optional) – A string giving the strategy for converting tile values to colors. e.g., ifClassificationStrategy.LESS_THAN_OR_EQUAL_TO
is specified, and the break map is {3: 0xff0000ff, 4: 0x00ff00ff}, then values up to 3 map to red, values from above 3 and up to and including 4 become green, and values over 4 become the fallback color.
Returns: ColorMap
-
classmethod
from_colors
(breaks, color_list, no_data_color=0, fallback=0, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>)¶ Converts lists of values and colors to a
ColorMap
.Parameters: - breaks (list) – The tile values that specify breaks in the color mapping.
- color_list ([int]) – The colors corresponding to the values in the breaks list, represented as integers—e.g., 0xff000080 is red at half opacity.
- no_data_color (int, optional) – A color to replace NODATA values with
- fallback (int, optional) – A color to replace cells that have no value in the mapping
- classification_strategy (str or
ClassificationStrategy
, optional) – A string giving the strategy for converting tile values to colors. e.g., ifClassificationStrategy.LESS_THAN_OR_EQUAL_TO
is specified, and the break map is {3: 0xff0000ff, 4: 0x00ff00ff}, then values up to 3 map to red, values from above 3 and up to and including 4 become green, and values over 4 become the fallback color.
Returns: ColorMap
-
classmethod
from_histogram
(histogram, color_list, no_data_color=0, fallback=0, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>)¶ Converts a wrapped GeoTrellis histogram into a
ColorMap
.Parameters: - histogram (
Histogram
) – AHistogram
instance; specifies breaks - color_list ([int]) – The colors corresponding to the values in the breaks list, represented as integers e.g., 0xff000080 is red at half opacity.
- no_data_color (int, optional) – A color to replace NODATA values with
- fallback (int, optional) – A color to replace cells that have no value in the mapping
- classification_strategy (str or
ClassificationStrategy
, optional) – A string giving the strategy for converting tile values to colors. e.g., ifClassificationStrategy.LESS_THAN_OR_EQUAL_TO
is specified, and the break map is {3: 0xff0000ff, 4: 0x00ff00ff}, then values up to 3 map to red, values from above 3 and up to and including 4 become green, and values over 4 become the fallback color.
Returns: ColorMap
- histogram (
-
static
nlcd_colormap
()¶ Returns a color map for NLCD tiles.
Returns: ColorMap
-
-
class
geopyspark.geotrellis.
LayerType
¶ The type of the key within the tuple of the wrapped RDD.
-
SPACETIME
= 'spacetime'¶
-
SPATIAL
= 'spatial'¶
-
-
class
geopyspark.geotrellis.
IndexingMethod
¶ How the wrapped should be indexed when saved.
-
HILBERT
= 'hilbert'¶
-
ROWMAJOR
= 'rowmajor'¶
-
ZORDER
= 'zorder'¶
-
-
class
geopyspark.geotrellis.
ResampleMethod
¶ Resampling Methods.
-
AVERAGE
= 'Average'¶
-
BILINEAR
= 'Bilinear'¶
-
CUBIC_CONVOLUTION
= 'CubicConvolution'¶
-
CUBIC_SPLINE
= 'CubicSpline'¶
-
LANCZOS
= 'Lanczos'¶
-
MAX
= 'Max'¶
-
MEDIAN
= 'Median'¶
-
MIN
= 'Min'¶
-
MODE
= 'Mode'¶
-
NEAREST_NEIGHBOR
= 'NearestNeighbor'¶
-
-
class
geopyspark.geotrellis.
TimeUnit
¶ ZORDER time units.
-
DAYS
= 'days'¶
-
HOURS
= 'hours'¶
-
MILLIS
= 'millis'¶
-
MINUTES
= 'minutes'¶
-
MONTHS
= 'months'¶
-
SECONDS
= 'seconds'¶
-
YEARS
= 'years'¶
-
-
class
geopyspark.geotrellis.
Operation
¶ Focal opertions.
-
ASPECT
= 'Aspect'¶
-
MAX
= 'Max'¶
-
MEAN
= 'Mean'¶
-
MEDIAN
= 'Median'¶
-
MIN
= 'Min'¶
-
MODE
= 'Mode'¶
-
SLOPE
= 'Slope'¶
-
STANDARD_DEVIATION
= 'StandardDeviation'¶
-
SUM
= 'Sum'¶
-
-
class
geopyspark.geotrellis.
Neighborhood
¶ Neighborhood types.
-
ANNULUS
= 'Annulus'¶
-
CIRCLE
= 'Circle'¶
-
NESW
= 'Nesw'¶
-
SQUARE
= 'Square'¶
-
WEDGE
= 'Wedge'¶
-
-
class
geopyspark.geotrellis.
ClassificationStrategy
¶ Classification strategies for color mapping.
-
EXACT
= 'Exact'¶
-
GREATER_THAN
= 'GreaterThan'¶
-
GREATER_THAN_OR_EQUAL_TO
= 'GreaterThanOrEqualTo'¶
-
LESS_THAN
= 'LessThan'¶
-
LESS_THAN_OR_EQUAL_TO
= 'LessThanOrEqualTo'¶
-
-
class
geopyspark.geotrellis.
CellType
¶ Cell types.
-
BOOL
= 'bool'¶
-
BOOLRAW
= 'boolraw'¶
-
FLOAT32
= 'float32'¶
-
FLOAT32RAW
= 'float32raw'¶
-
FLOAT64
= 'float64'¶
-
FLOAT64RAW
= 'float64raw'¶
-
INT16
= 'int16'¶
-
INT16RAW
= 'int16raw'¶
-
INT32
= 'int32'¶
-
INT32RAW
= 'int32raw'¶
-
INT8
= 'int8'¶
-
INT8RAW
= 'int8raw'¶
-
UINT16
= 'uint16'¶
-
UINT16RAW
= 'uint16raw'¶
-
UINT8
= 'uint8'¶
-
UINT8RAW
= 'uint8raw'¶
-
-
class
geopyspark.geotrellis.
ColorRamp
¶ ColorRamp names.
-
BLUE_TO_ORANGE
= 'BlueToOrange'¶
-
BLUE_TO_RED
= 'BlueToRed'¶
-
CLASSIFICATION_BOLD_LAND_USE
= 'ClassificationBoldLandUse'¶
-
CLASSIFICATION_MUTED_TERRAIN
= 'ClassificationMutedTerrain'¶
-
COOLWARM
= 'CoolWarm'¶
-
GREEN_TO_RED_ORANGE
= 'GreenToRedOrange'¶
-
HEATMAP_BLUE_TO_YELLOW_TO_RED_SPECTRUM
= 'HeatmapBlueToYellowToRedSpectrum'¶
-
HEATMAP_DARK_RED_TO_YELLOW_WHITE
= 'HeatmapDarkRedToYellowWhite'¶
-
HEATMAP_LIGHT_PURPLE_TO_DARK_PURPLE_TO_WHITE
= 'HeatmapLightPurpleToDarkPurpleToWhite'¶
-
HEATMAP_YELLOW_TO_RED
= 'HeatmapYellowToRed'¶
-
Hot
= 'Hot'¶
-
INFERNO
= 'Inferno'¶
-
LIGHT_TO_DARK_GREEN
= 'LightToDarkGreen'¶
-
LIGHT_TO_DARK_SUNSET
= 'LightToDarkSunset'¶
-
LIGHT_YELLOW_TO_ORANGE
= 'LightYellowToOrange'¶
-
MAGMA
= 'Magma'¶
-
PLASMA
= 'Plasma'¶
-
VIRIDIS
= 'Viridis'¶
-
-
geopyspark.geotrellis.
cost_distance
(friction_layer, geometries, max_distance)¶ Performs cost distance of a TileLayer.
Parameters: - friction_layer (
TiledRasterLayer
) –TiledRasterLayer
of a friction surface to traverse. - geometries (list) –
A list of shapely geometries to be used as a starting point.
Note
All geometries must be in the same CRS as the TileLayer.
- max_distance (int or float) – The maximum cost that a path may reach before the operation.
stops. This value can be an
int
orfloat
.
Returns: - friction_layer (
-
geopyspark.geotrellis.
euclidean_distance
(geometry, source_crs, zoom, cell_type=<CellType.FLOAT64: 'float64'>)¶ Calculates the Euclidean distance of a Shapely geometry.
Parameters: - geometry (shapely.geometry) – The input geometry to compute the Euclidean distance for.
- source_crs (str or int) – The CRS of the input geometry.
- zoom (int) – The zoom level of the output raster.
- cell_type (str or
CellType
, optional) – The data type of the cells for the new layer. If not specified, thenCellType.FLOAT64
is used.
Note
This function may run very slowly for polygonal inputs if they cover many cells of the output raster.
Returns: TiledRasterLayer
-
geopyspark.geotrellis.
hillshade
(tiled_raster_layer, band=0, azimuth=315.0, altitude=45.0, z_factor=1.0)¶ Computes Hillshade (shaded relief) from a raster.
The resulting raster will be a shaded relief map (a hill shading) based on the sun altitude, azimuth, and the z factor. The z factor is a conversion factor from map units to elevation units.
Returns a raster of ShortConstantNoDataCellType.
For descriptions of parameters, please see Esri Desktop’s description of Hillshade.
Parameters: - tiled_raster_layer (
TiledRasterLayer
) – The base layer that contains the rasters used to compute the hillshade. - band (int, optional) – The band of the raster to base the hillshade calculation on. Default is 0.
- azimuth (float, optional) – The azimuth angle of the source of light. Default value is 315.0.
- altitude (float, optional) – The angle of the altitude of the light above the horizon. Default is 45.0.
- z_factor (float, optional) – How many x and y units in a single z unit. Default value is 1.0.
Returns: - tiled_raster_layer (
-
class
geopyspark.geotrellis.
Histogram
(scala_histogram)¶ A wrapper class for a GeoTrellis Histogram.
The underlying histogram is produced from the values within a
TiledRasterLayer
. These values represented by the histogram can either beInt
orFloat
depending on the data type of the cells in the layer.Parameters: scala_histogram (py4j.JavaObject) – An instance of the GeoTrellis histogram. -
scala_histogram
¶ py4j.JavaObject – An instance of the GeoTrellis histogram.
-
bin_counts
()¶ Returns a list of tuples where the key is the bin label value and the value is the label’s respective count.
Returns: [(int, int)] or [(float, int)]
-
bucket_count
()¶ Returns the number of buckets within the histogram.
Returns: int
-
cdf
()¶ Returns the cdf of the distribution of the histogram.
Returns: [(float, float)]
-
classmethod
from_dict
(value)¶ Encodes histogram as a dictionary
-
item_count
(item)¶ Returns the total number of times a given item appears in the histogram.
Parameters: item (int or float) – The value whose occurences should be counted. Returns: The total count of the occurences of item
in the histogram.Return type: int
-
max
()¶ The largest value of the histogram.
This will return either an
int
orfloat
depedning on the type of values within the histogram.Returns: int or float
-
mean
()¶ Determines the mean of the histogram.
Returns: float
-
median
()¶ Determines the median of the histogram.
Returns: float
-
merge
(other_histogram)¶ Merges this instance of
Histogram
with another. The resultingHistogram
will contain values from both ``Histogram``sParameters: other_histogram ( Histogram
) – TheHistogram
that should be merged with this instance.Returns: Histogram
-
min
()¶ The smallest value of the histogram.
This will return either an
int
orfloat
depedning on the type of values within the histogram.Returns: int or float
-
min_max
()¶ The largest and smallest values of the histogram.
This will return either an
int
orfloat
depedning on the type of values within the histogram.Returns: (int, int) or (float, float)
-
mode
()¶ Determines the mode of the histogram.
This will return either an
int
orfloat
depedning on the type of values within the histogram.Returns: int or float
-
quantile_breaks
(num_breaks)¶ Returns quantile breaks for this Layer.
Parameters: num_breaks (int) – The number of breaks to return. Returns: [int]
-
to_dict
()¶ Encodes histogram as a dictionary
Returns: dict
-
-
class
geopyspark.geotrellis.
RasterLayer
(layer_type, srdd)¶ A wrapper of a RDD that contains GeoTrellis rasters.
Represents a layer that wraps a RDD that contains
(K, V)
. WhereK
is eitherProjectedExtent
orTemporalProjectedExtent
depending on thelayer_type
of the RDD, andV
being aTile
.The data held within this layer has not been tiled. Meaning the data has yet to be modified to fit a certain layout. See raster_rdd for more information.
Parameters: - layer_type (str or
LayerType
) – What the layer type of the geotiffs are. This is represented by either constants withinLayerType
or by a string. - srdd (py4j.java_gateway.JavaObject) – The coresponding Scala class. This is what allows
RasterLayer
to access the various Scala methods.
-
pysc
¶ pyspark.SparkContext – The
SparkContext
being used this session.
-
srdd
¶ py4j.java_gateway.JavaObject – The coresponding Scala class. This is what allows
RasterLayer
to access the various Scala methods.
-
bands
(band)¶ Select a subsection of bands from the
Tile
s within the layer.Note
There could be potential high performance cost if operations are performed between two sub-bands of a large data set.
Note
Due to the natue of GeoPySpark’s backend, if selecting a band that is out of bounds then the error returned will be a
py4j.protocol.Py4JJavaError
and not a normal Python error.Parameters: band (int or tuple or list or range) – The band(s) to be selected from the Tile
s. Can either be a single int, or a collection of ints.Returns: RasterLayer
with the selected bands.
-
cache
()¶ Persist this RDD with the default storage level (C{MEMORY_ONLY}).
-
collect_keys
()¶ Returns a list of all of the keys in the layer.
Note
This method should only be called on layers with a smaller number of keys, as a large number could cause memory issues.
Returns: [:obj:`~geopyspark.geotrellis.SpatialKey`]
or[:ob:`~geopyspark.geotrellis.SpaceTimeKey`]
-
collect_metadata
(layout=LocalLayout(tile_cols=256, tile_rows=256))¶ Iterate over the RDD records and generates layer metadata desribing the contained rasters.
- :param layout (
LayoutDefinition
or:GlobalLayout
or LocalLayout
, optional):- Target raster layout for the tiling operation.
Returns: Metadata
- :param layout (
-
convert_data_type
(new_type, no_data_value=None)¶ Converts the underlying, raster values to a new
CellType
.Parameters: - new_type (str or
CellType
) – The data type the cells should be to converted to. - no_data_value (int or float, optional) – The value that should be marked as NoData.
Returns: Raises: ValueError
– Ifno_data_value
is set and thenew_type
contains raw values.ValueError
– Ifno_data_value
is set andnew_type
is a boolean.
- new_type (str or
-
count
()¶ Returns how many elements are within the wrapped RDD.
Returns: The number of elements in the RDD. Return type: Int
-
classmethod
from_numpy_rdd
(layer_type, numpy_rdd)¶ Create a
RasterLayer
from a numpy RDD.Parameters: - layer_type (str or
LayerType
) – What the layer type of the geotiffs are. This is represented by either constants withinLayerType
or by a string. - numpy_rdd (pyspark.RDD) – A PySpark RDD that contains tuples of either
ProjectedExtent
s orTemporalProjectedExtent
s and rasters that are represented by a numpy array.
Returns: - layer_type (str or
-
getNumPartitions
()¶ Returns the number of partitions set for the wrapped RDD.
Returns: The number of partitions. Return type: Int
-
get_class_histogram
()¶ Creates a
Histogram
of integer values. Suitable for classification rasters with limited number values. If only single band is present histogram is returned directly.Returns: Histogram
or [Histogram
]
-
get_histogram
()¶ Creates a
Histogram
for each band in the layer. If only single band is present histogram is returned directly.Returns: Histogram
or [Histogram
]
-
get_min_max
()¶ Returns the maximum and minimum values of all of the rasters in the layer.
Returns: (float, float)
-
get_quantile_breaks
(num_breaks)¶ Returns quantile breaks for this Layer.
Parameters: num_breaks (int) – The number of breaks to return. Returns: [float]
-
get_quantile_breaks_exact_int
(num_breaks)¶ Returns quantile breaks for this Layer. This version uses the
FastMapHistogram
, which counts exact integer values. If your layer has too many values, this can cause memory errors.Parameters: num_breaks (int) – The number of breaks to return. Returns: [int]
-
layer_type
-
map_cells
(func)¶ Maps over the cells of each
Tile
within the layer with a given function.Note
This operation first needs to deserialize the wrapped
RDD
into Python and then serialize theRDD
back into aTiledRasterRDD
once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.Parameters: func (cells, nd => cells) – A function that takes two arguements: cells
andnd
. Wherecells
is the numpy array andnd
is theno_data_value
of theTile
. It returnscells
which are the new cells values of theTile
represented as a numpy array.Returns: RasterLayer
-
map_tiles
(func)¶ Maps over each
Tile
within the layer with a given function.Note
This operation first needs to deserialize the wrapped
RDD
into Python and then serialize theRDD
back into aRasterRDD
once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.Parameters: func ( Tile
=>Tile
) – A function that takes aTile
and returns aTile
.Returns: RasterLayer
-
persist
(storageLevel=StorageLevel(False, True, False, False, 1))¶ Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).
-
pysc
-
reclassify
(value_map, data_type, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>, replace_nodata_with=None)¶ Changes the cell values of a raster based on how the data is broken up.
Parameters: - value_map (dict) – A
dict
whose keys represent values where a break should occur and its values are the new value the cells within the break should become. - data_type (type) – The type of the values within the rasters. Can either be int or float.
- classification_strategy (str or
ClassificationStrategy
, optional) – How the cells should be classified along the breaks. If unspecified, thenClassificationStrategy.LESS_THAN_OR_EQUAL_TO
will be used. - replace_nodata_with (data_type, optional) – When remapping values, nodata values must be treated separately. If nodata values are intended to be replaced during the reclassify, this variable should be set to the intended value. If unspecified, nodata values will be preserved.
Note
NoData symbolizes a different value depending on if
data_type
is int or float. For int, the constantNO_DATA_INT
can be used which represents the NoData value for int in GeoTrellis. For float,float('nan')
is used to represent NoData.Returns: RasterLayer
- value_map (dict) – A
-
reproject
(target_crs, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)¶ Reproject rasters to
target_crs
. The reproject does not sample past tile boundary.Parameters: - target_crs (str or int) – Target CRS of reprojection. Either EPSG code, well-known name, or a PROJ.4 string.
- resample_method (str or
ResampleMethod
, optional) – The resample method to use for the reprojection. If none is specified, thenResampleMethods.NEAREST_NEIGHBOR
is used.
Returns:
-
srdd
-
tile_to_layout
(layout=LocalLayout(tile_cols=256, tile_rows=256), target_crs=None, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)¶ Cut tiles to layout and merge overlapping tiles. This will produce unique keys.
- :param layout (
Metadata
or:TiledRasterLayer
or LayoutDefinition
orGlobalLayout
orLocalLayout
, optional):Target raster layout for the tiling operation.
Parameters: - target_crs (str or int, optional) – Target CRS of reprojection. Either EPSG code,
well-known name, or a PROJ.4 string. If
None
, no reproject will be perfomed. - resample_method (str or
ResampleMethod
, optional) – The cell resample method to used during the tiling operation. Default is``ResampleMethods.NEAREST_NEIGHBOR``.
Returns: - :param layout (
-
to_geotiff_rdd
(storage_method=<StorageMethod.STRIPED: 'Striped'>, rows_per_strip=None, tile_dimensions=(256, 256), compression=<Compression.NO_COMPRESSION: 'NoCompression'>, color_space=<ColorSpace.BLACK_IS_ZERO: 1>, color_map=None, head_tags=None, band_tags=None)¶ Converts the rasters within this layer to GeoTiffs which are then converted to bytes. This is returned as a
RDD[(K, bytes)]
. WhereK
is eitherProjectedExtent
orTemporalProjectedExtent
.Parameters: - storage_method (str or
StorageMethod
, optional) – How the segments within the GeoTiffs should be arranged. Default isStorageMethod.STRIPED
. - rows_per_strip (int, optional) – How many rows should be in each strip segment of the
GeoTiffs if
storage_method
isStorageMethod.STRIPED
. IfNone
, then the strip size will default to a value that is 8K or less. - tile_dimensions ((int, int), optional) – The length and width for each tile segment of the GeoTiff
if
storage_method
isStorageMethod.TILED
. IfNone
then the default size is(256, 256)
. - compression (str or
Compression
, optional) – How the data should be compressed. Defaults toCompression.NO_COMPRESSION
. - color_space (str or
ColorSpace
, optional) – How the colors should be organized in the GeoTiffs. Defaults toColorSpace.BLACK_IS_ZERO
. - color_map (
ColorMap
, optional) – AColorMap
instance used to color the GeoTiffs to a different gradient. - head_tags (dict, optional) – A
dict
where each key and value is astr
. - band_tags (list, optional) – A
list
ofdict
s where each key and value is astr
. - Note – For more information on the contents of the tags, see www.gdal.org/gdal_datamodel.html
Returns: RDD[(K, bytes)]
- storage_method (str or
-
to_numpy_rdd
()¶ Converts a
RasterLayer
to a numpy RDD.Note
Depending on the size of the data stored within the RDD, this can be an exspensive operation and should be used with caution.
Returns: RDD
-
to_png_rdd
(color_map)¶ Converts the rasters within this layer to PNGs which are then converted to bytes. This is returned as a RDD[(K, bytes)].
Parameters: color_map ( ColorMap
) – AColorMap
instance used to color the PNGs.Returns: RDD[(K, bytes)]
-
to_spatial_layer
(target_time=None)¶ Converts a
RasterLayer
with alayout_type
ofLayoutType.SPACETIME
to aRasterLayer
with alayout_type
ofLayoutType.SPATIAL
.Parameters: target_time ( datetime.datetime
, optional) – The instance of interest. If set, the resultingRasterLayer
will only contain keys that contained the given instance. IfNone
, then all values within the layer will be kept.Returns: RasterLayer
Raises: ValueError
– If the layer already has alayout_type
ofLayoutType.SPATIAL
.
-
unpersist
()¶ Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
-
wrapped_rdds
()¶ Returns the list of RDD-containing objects wrapped by this object. The default implementation assumes that subclass contains a single RDD container, srdd, which implements the persist() and unpersist() methods.
- layer_type (str or
-
class
geopyspark.geotrellis.
TiledRasterLayer
(layer_type, srdd)¶ Wraps a RDD of tiled, GeoTrellis rasters.
Represents a RDD that contains
(K, V)
. WhereK
is eitherSpatialKey
orSpaceTimeKey
depending on thelayer_type
of the RDD, andV
being aTile
.The data held within the layer is tiled. This means that the rasters have been modified to fit a larger layout. For more information, see tiled-raster-rdd.
Parameters: - layer_type (str or
LayerType
) – What the layer type of the geotiffs are. This is represented by either constants withinLayerType
or by a string. - srdd (py4j.java_gateway.JavaObject) – The coresponding Scala class. This is what allows
TiledRasterLayer
to access the various Scala methods.
-
pysc
¶ pyspark.SparkContext – The
SparkContext
being used this session.
-
srdd
¶ py4j.java_gateway.JavaObject – The coresponding Scala class. This is what allows
RasterLayer
to access the various Scala methods.
-
is_floating_point_layer
¶ bool – Whether the data within the
TiledRasterLayer
is floating point or not.
-
zoom_level
¶ int – The zoom level of the layer. Can be
None
.
-
bands
(band)¶ Select a subsection of bands from the
Tile
s within the layer.Note
There could be potential high performance cost if operations are performed between two sub-bands of a large data set.
Note
Due to the natue of GeoPySpark’s backend, if selecting a band that is out of bounds then the error returned will be a
py4j.protocol.Py4JJavaError
and not a normal Python error.Parameters: band (int or tuple or list or range) – The band(s) to be selected from the Tile
s. Can either be a single int, or a collection of ints.Returns: TiledRasterLayer
with the selected bands.
-
cache
()¶ Persist this RDD with the default storage level (C{MEMORY_ONLY}).
-
collect_keys
()¶ Returns a list of all of the keys in the layer.
Note
This method should only be called on layers with a smaller number of keys, as a large number could cause memory issues.
Returns: [:class:`~geopyspark.geotrellis.ProjectedExtent`]
or[:class:`~geopyspark.geotrellis.TemporalProjectedExtent`]
-
convert_data_type
(new_type, no_data_value=None)¶ Converts the underlying, raster values to a new
CellType
.Parameters: - new_type (str or
CellType
) – The data type the cells should be to converted to. - no_data_value (int or float, optional) – The value that should be marked as NoData.
Returns: Raises: ValueError
– Ifno_data_value
is set and thenew_type
contains raw values.ValueError
– Ifno_data_value
is set andnew_type
is a boolean.
- new_type (str or
-
count
()¶ Returns how many elements are within the wrapped RDD.
Returns: The number of elements in the RDD. Return type: Int
-
focal
(operation, neighborhood=None, param_1=None, param_2=None, param_3=None)¶ Performs the given focal operation on the layers contained in the Layer.
Parameters: - operation (str or
Operation
) – The focal operation to be performed. - neighborhood (str or
Neighborhood
, optional) – The type of neighborhood to use in the focal operation. This can be represented by either an instance ofNeighborhood
, or by a constant. - param_1 (int or float, optional) – If using
Operation.SLOPE
, then this is the zFactor, else it is the first argument ofneighborhood
. - param_2 (int or float, optional) – The second argument of the
neighborhood
. - param_3 (int or float, optional) – The third argument of the
neighborhood
.
Note
param
only need to be set ifneighborhood
is not an instance ofNeighborhood
or ifneighborhood
isNone
.Any
param
that is not set will default to 0.0.If
neighborhood
isNone
thenoperation
must be eitherOperation.SLOPE
orOperation.ASPECT
.Returns: Raises: ValueError
– Ifoperation
is not a known operation.ValueError
– Ifneighborhood
is not a known neighborhood.ValueError
– Ifneighborhood
was not set, andoperation
is notOperation.SLOPE
orOperation.ASPECT
.
- operation (str or
-
classmethod
from_numpy_rdd
(layer_type, numpy_rdd, metadata, zoom_level=None)¶ Create a
TiledRasterLayer
from a numpy RDD.Parameters: - layer_type (str or
LayerType
) – What the layer type of the geotiffs are. This is represented by either constants withinLayerType
or by a string. - numpy_rdd (pyspark.RDD) – A PySpark RDD that contains tuples of either
SpatialKey
orSpaceTimeKey
and rasters that are represented by a numpy array. - metadata (
Metadata
) – TheMetadata
of theTiledRasterLayer
instance. - zoom_level (int, optional) – The
zoom_level
the resulting TiledRasterLayer should have. IfNone
, then the returned layer’szoom_level
will beNone
.
Returns: - layer_type (str or
-
getNumPartitions
()¶ Returns the number of partitions set for the wrapped RDD.
Returns: The number of partitions. Return type: Int
-
get_class_histogram
()¶ Creates a
Histogram
of integer values. Suitable for classification rasters with limited number values. If only single band is present histogram is returned directly.Returns: Histogram
or [Histogram
]
-
get_histogram
()¶ Creates a
Histogram
for each band in the layer. If only single band is present histogram is returned directly.Returns: Histogram
or [Histogram
]
-
get_min_max
()¶ Returns the maximum and minimum values of all of the rasters in the layer.
Returns: (float, float)
-
get_quantile_breaks
(num_breaks)¶ Returns quantile breaks for this Layer.
Parameters: num_breaks (int) – The number of breaks to return. Returns: [float]
-
get_quantile_breaks_exact_int
(num_breaks)¶ Returns quantile breaks for this Layer. This version uses the
FastMapHistogram
, which counts exact integer values. If your layer has too many values, this can cause memory errors.Parameters: num_breaks (int) – The number of breaks to return. Returns: [int]
-
histogram_series
(geometries)¶
-
layer_type
-
lookup
(col, row)¶ Return the value(s) in the image of a particular
SpatialKey
(given by col and row).Parameters: - col (int) – The
SpatialKey
column. - row (int) – The
SpatialKey
row.
Returns: [
Tile
]Raises: ValueError
– If using lookup on a nonLayerType.SPATIAL
TiledRasterLayer
.IndexError
– If col and row are not within theTiledRasterLayer
‘s bounds.
- col (int) – The
-
map_cells
(func)¶ Maps over the cells of each
Tile
within the layer with a given function.Note
This operation first needs to deserialize the wrapped
RDD
into Python and then serialize theRDD
back into aTiledRasterRDD
once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.Parameters: func (cells, nd => cells) – A function that takes two arguements: cells
andnd
. Wherecells
is the numpy array andnd
is theno_data_value
of the tile. It returnscells
which are the new cells values of the tile represented as a numpy array.Returns: TiledRasterLayer
-
map_tiles
(func)¶ Maps over each
Tile
within the layer with a given function.Note
This operation first needs to deserialize the wrapped
RDD
into Python and then serialize theRDD
back into aTiledRasterRDD
once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.Parameters: func ( Tile
=>Tile
) – A function that takes aTile
and returns aTile
.Returns: TiledRasterLayer
-
mask
(geometries)¶ Masks the
TiledRasterLayer
so that only values that intersect the geometries will be available.Parameters: geometries (shapely.geometry or [shapely.geometry]) – Either a list of, or a single shapely geometry/ies to use for the mask/s.
Note
All geometries must be in the same CRS as the TileLayer.
Returns: TiledRasterLayer
-
max_series
(geometries)¶
-
mean_series
(geometries)¶
-
min_series
(geometries)¶
-
normalize
(new_min, new_max, old_min=None, old_max=None)¶ Finds the min value that is contained within the given geometry.
Note
If
old_max - old_min <= 0
ornew_max - new_min <= 0
, then the normalization will fail.Parameters: - old_min (int or float, optional) – Old minimum. If not given, then the minimum value of this layer will be used.
- old_max (int or float, optional) – Old maximum. If not given, then the minimum value of this layer will be used.
- new_min (int or float) – New minimum to normalize to.
- new_max (int or float) – New maximum to normalize to.
Returns:
-
persist
(storageLevel=StorageLevel(False, True, False, False, 1))¶ Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).
-
polygonal_max
(geometry, data_type)¶ Finds the max value that is contained within the given geometry.
Parameters: - geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
Polygon
orMultiPolygon
that represents the area where the summary should be computed; or a WKB representation of the geometry. - data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns: int or float depending on
data_type
.Raises: TypeError
– Ifdata_type
is not an int or float.- geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
-
polygonal_mean
(geometry)¶ Finds the mean of all of the values that are contained within the given geometry.
Parameters: geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A Shapely Polygon
orMultiPolygon
that represents the area where the summary should be computed; or a WKB representation of the geometry.Returns: float
-
polygonal_min
(geometry, data_type)¶ Finds the min value that is contained within the given geometry.
Parameters: - geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
Polygon
orMultiPolygon
that represents the area where the summary should be computed; or a WKB representation of the geometry. - data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns: int or float depending on
data_type
.Raises: TypeError
– Ifdata_type
is not an int or float.- geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
-
polygonal_sum
(geometry, data_type)¶ Finds the sum of all of the values that are contained within the given geometry.
Parameters: - geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
Polygon
orMultiPolygon
that represents the area where the summary should be computed; or a WKB representation of the geometry. - data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns: int or float depending on
data_type
.Raises: TypeError
– Ifdata_type
is not an int or float.- geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
-
pyramid
(resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)¶ Creates a layer
Pyramid
where the resolution is halved per level.Parameters: resample_method (str or ResampleMethod
, optional) – The resample method to use when building the pyramid. Default isResampleMethods.NEAREST_NEIGHBOR
.Returns: Pyramid
.Raises: ValueError
– If this layer layout is not ofGlobalLayout
type.
-
pysc
-
reclassify
(value_map, data_type, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>, replace_nodata_with=None)¶ Changes the cell values of a raster based on how the data is broken up.
Parameters: - value_map (dict) – A
dict
whose keys represent values where a break should occur and its values are the new value the cells within the break should become. - data_type (type) – The type of the values within the rasters. Can either be int or float.
- classification_strategy (str or
ClassificationStrategy
, optional) – How the cells should be classified along the breaks. If unspecified, thenClassificationStrategy.LESS_THAN_OR_EQUAL_TO
will be used. - replace_nodata_with (data_type, optional) – When remapping values, nodata values must be treated separately. If nodata values are intended to be replaced during the reclassify, this variable should be set to the intended value. If unspecified, nodata values will be preserved.
Note
NoData symbolizes a different value depending on if
data_type
is int or float. For int, the constantNO_DATA_INT
can be used which represents the NoData value for int in GeoTrellis. For float,float('nan')
is used to represent NoData.Returns: TiledRasterLayer
- value_map (dict) – A
-
repartition
(num_partitions=None)¶ Repartition underlying RDD using HashPartitioner. If
num_partitions
is None, existing number of partitions will be used.Parameters: num_partitions (int, optional) – Desired number of partitions Returns: TiledRasterLayer
-
reproject
(target_crs, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)¶ Reproject rasters to
target_crs
. The reproject does not sample past tile boundary.Parameters: - target_crs (str or int) – Target CRS of reprojection. Either EPSG code, well-known name, or a PROJ.4 string.
- resample_method (str or
ResampleMethod
, optional) – The resample method to use for the reprojection. If none is specified, thenResampleMethods.NEAREST_NEIGHBOR
is used.
Returns:
-
save_stitched
(path, crop_bounds=None, crop_dimensions=None)¶ Stitch all of the rasters within the Layer into one raster and then saves it to a given path.
Parameters: - path (str) – The path of the geotiff to save. The path must be on the local file system.
- crop_bounds (
Extent
, optional) – The subExtent
with which to crop the raster before saving. IfNone
, then the whole raster will be saved. - crop_dimensions (tuple(int) or list(int), optional) – cols and rows of the image to save
represented as either a tuple or list. If
None
then all cols and rows of the raster will be save.
Note
This can only be used on
LayerType.SPATIAL
TiledRasterLayer
s.Note
If
crop_dimensions
is set thencrop_bounds
must also be set.
-
srdd
-
star_series
(geometries, fn)¶
-
stitch
()¶ Stitch all of the rasters within the Layer into one raster.
Note
This can only be used on
LayerType.SPATIAL
TiledRasterLayer
s.Returns: Tile
-
sum_series
(geometries)¶
-
tile_to_layout
(layout, target_crs=None, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)¶ Cut tiles to a given layout and merge overlapping tiles. This will produce unique keys.
- :param layout (
LayoutDefinition
or:Metadata
or TiledRasterLayer
orGlobalLayout
orLocalLayout
):Target raster layout for the tiling operation.
Parameters: - target_crs (str or int, optional) – Target CRS of reprojection. Either EPSG code,
well-known name, or a PROJ.4 string. If
None
, no reproject will be perfomed. - resample_method (str or
ResampleMethod
, optional) – The resample method to use for the reprojection. If none is specified, thenResampleMethods.NEAREST_NEIGHBOR
is used.
Returns: - :param layout (
-
to_geotiff_rdd
(storage_method=<StorageMethod.STRIPED: 'Striped'>, rows_per_strip=None, tile_dimensions=(256, 256), compression=<Compression.NO_COMPRESSION: 'NoCompression'>, color_space=<ColorSpace.BLACK_IS_ZERO: 1>, color_map=None, head_tags=None, band_tags=None)¶ Converts the rasters within this layer to GeoTiffs which are then converted to bytes. This is returned as a
RDD[(K, bytes)]
. WhereK
is eitherSpatialKey
orSpaceTimeKey
.Parameters: - storage_method (str or
StorageMethod
, optional) – How the segments within the GeoTiffs should be arranged. Default isStorageMethod.STRIPED
. - rows_per_strip (int, optional) – How many rows should be in each strip segment of the
GeoTiffs if
storage_method
isStorageMethod.STRIPED
. IfNone
, then the strip size will default to a value that is 8K or less. - tile_dimensions ((int, int), optional) – The length and width for each tile segment of the GeoTiff
if
storage_method
isStorageMethod.TILED
. IfNone
then the default size is(256, 256)
. - compression (str or
Compression
, optional) – How the data should be compressed. Defaults toCompression.NO_COMPRESSION
. - color_space (str or
ColorSpace
, optional) – How the colors should be organized in the GeoTiffs. Defaults toColorSpace.BLACK_IS_ZERO
. - color_map (
ColorMap
, optional) – AColorMap
instance used to color the GeoTiffs to a different gradient. - head_tags (dict, optional) – A
dict
where each key and value is astr
. - band_tags (list, optional) – A
list
ofdict
s where each key and value is astr
. - Note – For more information on the contents of the tags, see www.gdal.org/gdal_datamodel.html
Returns: RDD[(K, bytes)]
- storage_method (str or
-
to_numpy_rdd
()¶ Converts a
TiledRasterLayer
to a numpy RDD.Note
Depending on the size of the data stored within the RDD, this can be an exspensive operation and should be used with caution.
Returns: RDD
-
to_png_rdd
(color_map)¶ Converts the rasters within this layer to PNGs which are then converted to bytes. This is returned as a RDD[(K, bytes)].
Parameters: color_map ( ColorMap
) – AColorMap
instance used to color the PNGs.Returns: RDD[(K, bytes)]
-
to_spatial_layer
(target_time=None)¶ Converts a
TiledRasterLayer
with alayout_type
ofLayoutType.SPACETIME
to aTiledRasterLayer
with alayout_type
ofLayoutType.SPATIAL
.Parameters: target_time ( datetime.datetime
, optional) – The instance of interest. If set, the resultingTiledRasterLayer
will only contain keys that contained the given instance. IfNone
, then all values within the layer will be kept.Returns: TiledRasterLayer
Raises: ValueError
– If the layer already has alayout_type
ofLayoutType.SPATIAL
.
-
unpersist
()¶ Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
-
wrapped_rdds
()¶ Returns the list of RDD-containing objects wrapped by this object. The default implementation assumes that subclass contains a single RDD container, srdd, which implements the persist() and unpersist() methods.
- layer_type (str or
-
class
geopyspark.geotrellis.
Pyramid
(levels)¶ Contains a list of
TiledRasterLayer
s that make up a tile pyramid. Each layer represents a level within the pyramid. This class is used when creating a tile server.Map algebra can performed on instances of this class.
Parameters: levels (list or dict) – A list of TiledRasterLayer
s or a dict ofTiledRasterLayer
s where the value is the layer itself and the key is its given zoom level.-
pysc
¶ pyspark.SparkContext – The
SparkContext
being used this session.
-
layer_type (class
~geopyspark.geotrellis.constants.LayerType): What the layer type of the geotiffs are.
-
levels
¶ dict – A dict of
TiledRasterLayer
s where the value is the layer itself and the key is its given zoom level.
-
max_zoom
¶ int – The highest zoom level of the pyramid.
-
is_cached
¶ bool – Signals whether or not the internal RDDs are cached. Default is
False
.
-
histogram
¶ Histogram
– TheHistogram
that represents the layer with the max zoomw. Will not be calculated unless theget_histogram()
method is used. Otherwise, its value isNone
.
Raises: TypeError
– Iflevels
is neither a list or dict.-
cache
()¶ Persist this RDD with the default storage level (C{MEMORY_ONLY}).
-
count
()¶ Returns how many elements are within the wrapped RDD.
Returns: The number of elements in the RDD. Return type: Int
-
getNumPartitions
()¶ Returns the number of partitions set for the wrapped RDD.
Returns: The number of partitions. Return type: Int
-
histogram
-
is_cached
-
layer_type
¶
-
levels
-
max_zoom
-
persist
(storageLevel=StorageLevel(False, True, False, False, 1))¶ Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).
-
pysc
-
unpersist
()¶ Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
-
wrapped_rdds
()¶ Returns a list of the wrapped, Scala RDDs within each layer of the pyramid.
Returns: [org.apache.spark.rdd.RDD]
-
-
class
geopyspark.geotrellis.
Square
(extent)¶
-
class
geopyspark.geotrellis.
Circle
(radius)¶ A circle neighborhood.
Parameters: radius (int or float) – The radius of the circle that determines which cells fall within the bounding box. -
radius
¶ int or float – The radius of the circle that determines which cells fall within the bounding box.
-
param_1
¶ float – Same as
radius
.
-
param_2
¶ float – Unused param for
Circle
. Is 0.0.
-
param_3
¶ float – Unused param for
Circle
. Is 0.0.
-
name
¶ str – The name of the neighborhood which is, “circle”.
Note
Cells that lie exactly on the radius of the circle are apart of the neighborhood.
-
-
class
geopyspark.geotrellis.
Wedge
(radius, start_angle, end_angle)¶ A wedge neighborhood.
Parameters: - radius (int or float) – The radius of the wedge.
- start_angle (int or float) – The starting angle of the wedge in degrees.
- end_angle (int or float) – The ending angle of the wedge in degrees.
-
radius
¶ int or float – The radius of the wedge.
-
start_angle
¶ int or float – The starting angle of the wedge in degrees.
-
end_angle
¶ int or float – The ending angle of the wedge in degrees.
-
param_1
¶ float – Same as
radius
.
-
param_2
¶ float – Same as
start_angle
.
-
param_3
¶ float – Same as
end_angle
.
-
name
¶ str – The name of the neighborhood which is, “wedge”.
-
class
geopyspark.geotrellis.
Nesw
(extent)¶ A neighborhood that includes a column and row intersection for the focus.
Parameters: extent (int or float) – The extent of this neighborhood. This represents the how many cells past the focus the bounding box goes. -
extent
¶ int or float – The extent of this neighborhood. This represents the how many cells past the focus the bounding box goes.
-
param_1
¶ float – Same as
extent
.
-
param_2
¶ float – Unused param for
Nesw
. Is 0.0.
-
param_3
¶ float – Unused param for
Nesw
. Is 0.0.
-
name
¶ str – The name of the neighborhood which is, “nesw”.
-
-
class
geopyspark.geotrellis.
Annulus
(inner_radius, outer_radius)¶ An Annulus neighborhood.
Parameters: - inner_radius (int or float) – The radius of the inner circle.
- outer_radius (int or float) – The radius of the outer circle.
-
inner_radius
¶ int or float – The radius of the inner circle.
-
outer_radius
¶ int or float – The radius of the outer circle.
-
param_1
¶ float – Same as
inner_radius
.
-
param_2
¶ float – Same as
outer_radius
.
-
param_3
¶ float – Unused param for
Annulus
. Is 0.0.
-
name
¶ str – The name of the neighborhood which is, “annulus”.
-
geopyspark.geotrellis.
rasterize
(geoms, crs, zoom, fill_value, cell_type=<CellType.FLOAT64: 'float64'>, options=None, num_partitions=None)¶ Rasterizes a Shapely geometries.
Parameters: - geoms ([shapely.geometry]) – List of shapely geometries to rasterize.
- crs (str or int) – The CRS of the input geometry.
- zoom (int) – The zoom level of the output raster.
- fill_value (int or float) – Value to burn into pixels intersectiong geometry
- cell_type (str or
CellType
) – Which data type the cells should be when created. Defaults toCellType.FLOAT64
. - options (
RasterizerOptions
, optional) – Pixel intersection options. - num_partitions (int, optional) – The number of repartitions Spark will make when the data is
repartitioned. If
None
, then the data will not be repartitioned.
Returns:
-
class
geopyspark.geotrellis.
TileRender
(render_function)¶ A Python implementation of the Scala geopyspark.geotrellis.tms.TileRender interface. Permits a callback from Scala to Python to allow for custom rendering functions.
Parameters: render_function (Tile => PIL.Image.Image) – A function to convert geopyspark.geotrellis.Tile to a PIL Image. -
render_function
¶ Tile => PIL.Image.Image – A function to convert geopyspark.geotrellis.Tile to a PIL Image.
-
TileRender.
renderEncoded
(scala_array)¶ A function to convert an array to an image.
Parameters: scala_array – A linear array of bytes representing the protobuf-encoded contents of a tile Returns: bytes representing an image
-
TileRender.
requiresEncoding
()¶
-
-
class
geopyspark.geotrellis.
TMS
(server)¶ Provides a TMS server for raster data.
In order to display raster data on a variety of different map interfaces (e.g., leaflet maps, geojson.io, GeoNotebook, and others), we provide the TMS class.
Parameters: server (JavaObject) – The Java TMSServer instance -
pysc
¶ pyspark.SparkContext – The
SparkContext
being used this session.
-
server
¶ JavaObject – The Java TMSServer instance
-
host
¶ str – The IP address of the host, if bound, else None
-
port
¶ int – The port number of the TMS server, if bound, else None
-
url_pattern
¶ string – The URI pattern for the current TMS service, with {z}, {x}, {y} tokens. Can be copied directly to services such as geojson.io.
-
bind
(host=None, requested_port=None)¶ Starts up a TMS server.
Parameters: - host (str, optional) – The target host. Typically “localhost”, “127.0.0.1”, or “0.0.0.0”. The latter will make the TMS service accessible from the world. If omitted, defaults to localhost.
- requested_port (optional, int) – A port number to bind the service to. If omitted, use a random available port.
-
classmethod
build
(source, display, allow_overzooming=True)¶ Builds a TMS server from one or more layers.
This function takes a SparkContext, a source or list of sources, and a display method and creates a TMS server to display the desired content. The display method is supplied as a ColorMap (only available when there is a single source), or a callable object which takes either a single tile input (when there is a single source) or a list of tiles (for multiple sources) and returns the bytes representing an image file for that tile.
Parameters: - source (tuple or orlist or
Pyramid
) – The tile sources to render. Tuple inputs are (str, str) pairs where the first component is the URI of a catalog and the second is the layer name. A list input may be any combination of tuples andPyramid
s. - display (ColorMap, callable) – Method for mapping tiles to images. ColorMap may only be applied to single input source. Callable will take a single numpy array for a single source, or a list of numpy arrays for multiple sources. In the case of multiple inputs, resampling may be required if the tile sources have different tile sizes. Returns bytes representing the resulting image.
- allow_overzooming (bool) – If set, viewing at zoom levels above the highest available zoom level will produce tiles that are resampled from the highest zoom level present in the data set.
- source (tuple or orlist or
-
host
Returns the IP string of the server’s host if bound, else None.
Returns: (str)
-
port
Returns the port number for the current TMS server if bound, else None.
Returns: (int)
-
set_handshake
(handshake)¶
-
unbind
()¶ Shuts down the TMS service, freeing the assigned port.
-
url_pattern
Returns the URI for the tiles served by the present server. Contains {z}, {x}, and {y} tokens to be substituted for the desired zoom and x/y tile position.
Returns: (str)
-
geopyspark.geotrellis.ProtoBufCodecs module¶
-
geopyspark.geotrellis.
protobuf
¶ alias of
geopyspark.geotrellis.protobuf
geopyspark.geotrellis.ProtoBufSerializer module¶
-
class
geopyspark.geotrellis.protobufserializer.
ProtoBufSerializer
(decoding_method, encoding_method)¶ The serializer used by a RDD to encode/decode values to/from Python.
Parameters: - decoding_method (func) – The decocding function for the values within the RDD.
- encoding_method (func) – The encocding function for the values within the RDD.
-
decoding_method
¶ func – The decocding function for the values within the RDD.
-
encoding_method
¶ func – The encocding function for the values within the RDD.
-
dumps
(obj)¶ Serialize an object into a byte array.
Note
When batching is used, this will be called with a list of objects.
Parameters: obj – The object to serialized into a byte array. Returns: The byte array representation of the obj
.
-
loads
(obj)¶ Deserializes a byte array into a collection of Python objects.
Parameters: obj – The byte array representation of an object to be deserialized into the object. Returns: A list of deserialized objects.
geopyspark.geotrellis.catalog module¶
Methods for reading, querying, and saving tile layers to and from GeoTrellis Catalogs.
-
geopyspark.geotrellis.catalog.
read_layer_metadata
(uri, layer_name, layer_zoom)¶ Reads the metadata from a saved layer without reading in the whole layer.
Parameters: - uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
- layer_name (str) – The name of the GeoTrellis catalog to be read from.
- layer_zoom (int) – The zoom level of the layer that is to be read.
Returns:
-
geopyspark.geotrellis.catalog.
read_value
(uri, layer_name, layer_zoom, col, row, zdt=None, store=None)¶ Reads a single
Tile
from a GeoTrellis catalog. Unlike other functions in this module, this will not return aTiledRasterLayer
, but rather a GeoPySpark formatted raster.Note
When requesting a tile that does not exist,
None
will be returned.Parameters: - uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
- layer_name (str) – The name of the GeoTrellis catalog to be read from.
- layer_zoom (int) – The zoom level of the layer that is to be read.
- col (int) – The col number of the tile within the layout. Cols run east to west.
- row (int) – The row number of the tile within the layout. Row run north to south.
- zdt (
datetime.datetime
) – The time stamp of the tile if the data is spatial-temporal. This is represented as adatetime.datetime.
instance. The default value is,None
. IfNone
, then only the spatial area will be queried. - store (str or
AttributeStore
, optional) –AttributeStore
instance or URI for layer metadata lookup.
Returns:
-
geopyspark.geotrellis.catalog.
query
(uri, layer_name, layer_zoom=None, query_geom=None, time_intervals=None, query_proj=None, num_partitions=None, store=None)¶ Queries a single, zoom layer from a GeoTrellis catalog given spatial and/or time parameters.
Note
The whole layer could still be read in if
intersects
and/ortime_intervals
have not been set, or if the querried region contains the entire layer.Parameters: - layer_type (str or
LayerType
) – What the layer type of the geotiffs are. This is represented by either constants withinLayerType
or by a string. - uri (str) – The Uniform Resource Identifier used to point towards the desired GeoTrellis catalog to be read from. The shape of this string varies depending on backend.
- layer_name (str) – The name of the GeoTrellis catalog to be querried.
- layer_zoom (int, optional) – The zoom level of the layer that is to be querried.
If
None
, then thelayer_zoom
will be set to 0. - query_geom (bytes or shapely.geometry or
Extent
, Optional) –The desired spatial area to be returned. Can either be a string, a shapely geometry, or instance of
Extent
, or a WKB verson of the geometry.Note
Not all shapely geometires are supported. The following is are the types that are supported: * Point * Polygon * MultiPolygon
Note
Only layers that were made from spatial, singleband GeoTiffs can query a
Point
. All other types are restricted toPolygon
andMulitPolygon
.If not specified, then the entire layer will be read.
- time_intervals (
[datetime.datetime]
, optional) – A list of the time intervals to query. This parameter is only used when querying spatial-temporal data. The default value is,None
. IfNone
, then only the spatial area will be querried. - query_proj (int or str, optional) – The crs of the querried geometry if it is different
than the layer it is being filtered against. If they are different and this is not set,
then the returned
TiledRasterLayer
could contain incorrect values. IfNone
, then the geometry and layer are assumed to be in the same projection. - num_partitions (int, optional) – Sets RDD partition count when reading from catalog.
- store (str or
AttributeStore
, optional) –AttributeStore
instance or URI for layer metadata lookup.
Returns: - layer_type (str or
-
geopyspark.geotrellis.catalog.
write
(uri, layer_name, tiled_raster_layer, index_strategy=<IndexingMethod.ZORDER: 'zorder'>, time_unit=None, store=None)¶ Writes a tile layer to a specified destination.
Parameters: - uri (str) – The Uniform Resource Identifier used to point towards the desired location for the tile layer to written to. The shape of this string varies depending on backend.
- layer_name (str) – The name of the new, tile layer.
- layer_zoom (int) – The zoom level the layer should be saved at.
- tiled_raster_layer (
TiledRasterLayer
) – TheTiledRasterLayer
to be saved. - index_strategy (str or
IndexingMethod
) – The method used to orginize the saved data. Depending on the type of data within the layer, only certain methods are available. Can either be a string or aIndexingMethod
attribute. The default method used is,IndexingMethod.ZORDER
. - time_unit (str or
TimeUnit
, optional) – Which time unit should be used when saving spatial-temporal data. This controls the resolution of each index. Meaning, what time intervals are used to seperate each record. While this is set toNone
as default, it must be set if saving spatial-temporal data. Depending on the indexing method chosen, different time units are used. - store (str or
AttributeStore
, optional) –AttributeStore
instance or URI for layer metadata lookup.
-
class
geopyspark.geotrellis.catalog.
AttributeStore
(uri)¶ AttributeStore provides a way to read and write GeoTrellis layer attributes.
Internally all attribute values are stored as JSON, here they are exposed as dictionaries. Classes often stored have a
.from_dict
and.to_dict
methods to bridge the gap:import geopyspark as gps store = gps.AttributeStore("s3://azavea-datahub/catalog") hist = store.layer("us-nlcd2011-30m-epsg3857", zoom=7).read("histogram") hist = gps.Histogram.from_dict(hist)
-
class
Attributes
(store, layer_name, layer_zoom)¶ Accessor class for all attributes for a given layer
-
delete
(name)¶ Delete attribute by name
Parameters: name (str) – Attribute name
-
read
(name)¶ Read layer attribute by name as a dict
Parameters: name (str) – Returns: Attribute value Return type: dict
-
write
(name, value)¶ Write layer attribute value as a dict
Parameters: - name (str) – Attribute name
- value (dict) – Attribute value
-
-
classmethod
AttributeStore.
build
(store)¶ Builds AttributeStore from URI or passes an instance through.
Parameters: uri (str or AttributeStore) – URI for AttributeStore object or instance. Returns: AttributeStore
-
classmethod
AttributeStore.
cached
(uri)¶ Returns cached version of AttributeStore for URI or creates one
-
AttributeStore.
contains
(name, zoom=None)¶ Checks if this store contains a layer metadata.
Parameters: - name (str) – Layer name
- zoom (int, optional) – Layer zoom
Returns: bool
-
AttributeStore.
delete
(name, zoom=None)¶ Delete layer and all its attributes
Parameters: - name (str) – Layer name
- zoom (int, optional) – Layer zoom
-
AttributeStore.
layer
(name, zoom=None)¶ Layer Attributes object for given layer :param name: Layer name :type name: str :param zoom: Layer zoom :type zoom: int, optional
Returns: Attributes
-
AttributeStore.
layers
()¶ List all layers Attributes objects
Returns: [:class:`~geopyspark.geotrellis.catalog.AttributeStore.Attributes`]
-
class
geopyspark.geotrellis.constants module¶
Constants that are used by geopyspark.geotrellis
classes, methods, and functions.
-
class
geopyspark.geotrellis.constants.
LayerType
¶ The type of the key within the tuple of the wrapped RDD.
-
SPACETIME
= 'spacetime'¶
-
SPATIAL
= 'spatial'¶ Indicates that the RDD contains
(K, V)
pairs, where theK
has a spatial and time attribute. BothTemporalProjectedExtent
andSpaceTimeKey
are examples of this type ofK
.
-
-
class
geopyspark.geotrellis.constants.
IndexingMethod
¶ How the wrapped should be indexed when saved.
-
HILBERT
= 'hilbert'¶ A key indexing method. Works only for RDDs that contain
SpatialKey
. This method provides the fastest lookup of all the key indexing method, however, it does not give good locality guarantees. It is recommended then that this method should only be used when locality is not important for your analysis.
-
ROWMAJOR
= 'rowmajor'¶
-
ZORDER
= 'zorder'¶ A key indexing method. Works for RDDs that contain both
SpatialKey
andSpaceTimeKey
. Note, indexes are determined by thex
,y
, and ifSPACETIME
, the temporal resolutions of a point. This is expressed in bits, and has a max value of 62. Thus if the sum of those resolutions are greater than 62, then the indexing will fail.
-
-
class
geopyspark.geotrellis.constants.
ResampleMethod
¶ Resampling Methods.
-
AVERAGE
= 'Average'¶
-
BILINEAR
= 'Bilinear'¶
-
CUBIC_CONVOLUTION
= 'CubicConvolution'¶
-
CUBIC_SPLINE
= 'CubicSpline'¶
-
LANCZOS
= 'Lanczos'¶
-
MAX
= 'Max'¶
-
MEDIAN
= 'Median'¶
-
MIN
= 'Min'¶
-
MODE
= 'Mode'¶
-
NEAREST_NEIGHBOR
= 'NearestNeighbor'¶
-
-
class
geopyspark.geotrellis.constants.
TimeUnit
¶ ZORDER time units.
-
DAYS
= 'days'¶
-
HOURS
= 'hours'¶
-
MILLIS
= 'millis'¶
-
MINUTES
= 'minutes'¶
-
MONTHS
= 'months'¶
-
SECONDS
= 'seconds'¶
-
YEARS
= 'years'¶
-
-
class
geopyspark.geotrellis.constants.
Operation
¶ Focal opertions.
-
ASPECT
= 'Aspect'¶
-
MAX
= 'Max'¶
-
MEAN
= 'Mean'¶
-
MEDIAN
= 'Median'¶
-
MIN
= 'Min'¶
-
MODE
= 'Mode'¶
-
SLOPE
= 'Slope'¶
-
STANDARD_DEVIATION
= 'StandardDeviation'¶
-
SUM
= 'Sum'¶
-
-
class
geopyspark.geotrellis.constants.
Neighborhood
¶ Neighborhood types.
-
ANNULUS
= 'Annulus'¶
-
CIRCLE
= 'Circle'¶
-
NESW
= 'Nesw'¶
-
SQUARE
= 'Square'¶
-
WEDGE
= 'Wedge'¶
-
-
class
geopyspark.geotrellis.constants.
ClassificationStrategy
¶ Classification strategies for color mapping.
-
EXACT
= 'Exact'¶
-
GREATER_THAN
= 'GreaterThan'¶
-
GREATER_THAN_OR_EQUAL_TO
= 'GreaterThanOrEqualTo'¶
-
LESS_THAN
= 'LessThan'¶
-
LESS_THAN_OR_EQUAL_TO
= 'LessThanOrEqualTo'¶
-
-
class
geopyspark.geotrellis.constants.
CellType
¶ Cell types.
-
BOOL
= 'bool'¶
-
BOOLRAW
= 'boolraw'¶
-
FLOAT32
= 'float32'¶
-
FLOAT32RAW
= 'float32raw'¶
-
FLOAT64
= 'float64'¶
-
FLOAT64RAW
= 'float64raw'¶
-
INT16
= 'int16'¶
-
INT16RAW
= 'int16raw'¶
-
INT32
= 'int32'¶
-
INT32RAW
= 'int32raw'¶
-
INT8
= 'int8'¶
-
INT8RAW
= 'int8raw'¶
-
UINT16
= 'uint16'¶
-
UINT16RAW
= 'uint16raw'¶
-
UINT8
= 'uint8'¶
-
UINT8RAW
= 'uint8raw'¶
-
-
class
geopyspark.geotrellis.constants.
ColorRamp
¶ ColorRamp names.
-
BLUE_TO_ORANGE
= 'BlueToOrange'¶
-
BLUE_TO_RED
= 'BlueToRed'¶
-
CLASSIFICATION_BOLD_LAND_USE
= 'ClassificationBoldLandUse'¶
-
CLASSIFICATION_MUTED_TERRAIN
= 'ClassificationMutedTerrain'¶
-
COOLWARM
= 'CoolWarm'¶
-
GREEN_TO_RED_ORANGE
= 'GreenToRedOrange'¶
-
HEATMAP_BLUE_TO_YELLOW_TO_RED_SPECTRUM
= 'HeatmapBlueToYellowToRedSpectrum'¶
-
HEATMAP_DARK_RED_TO_YELLOW_WHITE
= 'HeatmapDarkRedToYellowWhite'¶
-
HEATMAP_LIGHT_PURPLE_TO_DARK_PURPLE_TO_WHITE
= 'HeatmapLightPurpleToDarkPurpleToWhite'¶
-
HEATMAP_YELLOW_TO_RED
= 'HeatmapYellowToRed'¶
-
Hot
= 'Hot'¶
-
INFERNO
= 'Inferno'¶
-
LIGHT_TO_DARK_GREEN
= 'LightToDarkGreen'¶
-
LIGHT_TO_DARK_SUNSET
= 'LightToDarkSunset'¶
-
LIGHT_YELLOW_TO_ORANGE
= 'LightYellowToOrange'¶
-
MAGMA
= 'Magma'¶
-
PLASMA
= 'Plasma'¶
-
VIRIDIS
= 'Viridis'¶
-
geopyspark.geotrellis.geotiff module¶
This module contains functions that create RasterLayer
from files.
-
geopyspark.geotrellis.geotiff.
get
(layer_type, uri, crs=None, max_tile_size=None, num_partitions=None, chunk_size=None, time_tag=None, time_format=None, s3_client=None)¶ Creates a
RasterLayer
from GeoTiffs that are located on the local file system,HDFS
, orS3
.Parameters: - layer_type (str or
LayerType
) –What the layer type of the geotiffs are. This is represented by either constants within
LayerType
or by a string.Note
All of the GeoTiffs must have the same saptial type.
- uri (str) – The path to a given file/directory.
- crs (str, optional) – The CRS that the output tiles should be
in. The CRS must be in the well-known name format. If
None
, then the CRS that the tiles were originally in will be used. - max_tile_size (int, optional) – The max size of each tile in the
resulting Layer. If the size is smaller than a read in tile,
then that tile will be broken into tiles of the specified
size. If
None
, then the whole tile will be read in. - num_partitions (int, optional) – The number of repartitions Spark
will make when the data is repartitioned. If
None
, then the data will not be repartitioned. - chunk_size (int, optional) – How many bytes of the file should be
read in at a time. If
None
, then files will be read in 65536 byte chunks. - time_tag (str, optional) – The name of the tiff tag that contains
the time stamp for the tile. If
None
, then the default value is:TIFFTAG_DATETIME
. - time_format (str, optional) – The pattern of the time stamp for
java.time.format.DateTimeFormatter to parse. If
None
, then the default value is:yyyy:MM:dd HH:mm:ss
. - s3_client (str, optional) –
Which
S3Cleint
to use when reading GeoTiffs from S3. There are currently two options:default
andmock
. IfNone
,defualt
is used.Note
mock
should only be used in unit tests and debugging.
Returns: - layer_type (str or
geopyspark.geotrellis.neighborhood module¶
Classes that represent the various neighborhoods used in focal functions.
Note
Once a parameter has been entered for any one of these classes it gets converted to a
float
if it was originally an int
.
-
class
geopyspark.geotrellis.neighborhood.
Circle
(radius)¶ A circle neighborhood.
Parameters: radius (int or float) – The radius of the circle that determines which cells fall within the bounding box. -
radius
¶ int or float – The radius of the circle that determines which cells fall within the bounding box.
-
param_1
¶ float – Same as
radius
.
-
param_2
¶ float – Unused param for
Circle
. Is 0.0.
-
param_3
¶ float – Unused param for
Circle
. Is 0.0.
-
name
¶ str – The name of the neighborhood which is, “circle”.
Note
Cells that lie exactly on the radius of the circle are apart of the neighborhood.
-
-
class
geopyspark.geotrellis.neighborhood.
Wedge
(radius, start_angle, end_angle)¶ A wedge neighborhood.
Parameters: - radius (int or float) – The radius of the wedge.
- start_angle (int or float) – The starting angle of the wedge in degrees.
- end_angle (int or float) – The ending angle of the wedge in degrees.
-
radius
¶ int or float – The radius of the wedge.
-
start_angle
¶ int or float – The starting angle of the wedge in degrees.
-
end_angle
¶ int or float – The ending angle of the wedge in degrees.
-
param_1
¶ float – Same as
radius
.
-
param_2
¶ float – Same as
start_angle
.
-
param_3
¶ float – Same as
end_angle
.
-
name
¶ str – The name of the neighborhood which is, “wedge”.
-
class
geopyspark.geotrellis.neighborhood.
Nesw
(extent)¶ A neighborhood that includes a column and row intersection for the focus.
Parameters: extent (int or float) – The extent of this neighborhood. This represents the how many cells past the focus the bounding box goes. -
extent
¶ int or float – The extent of this neighborhood. This represents the how many cells past the focus the bounding box goes.
-
param_1
¶ float – Same as
extent
.
-
param_2
¶ float – Unused param for
Nesw
. Is 0.0.
-
param_3
¶ float – Unused param for
Nesw
. Is 0.0.
-
name
¶ str – The name of the neighborhood which is, “nesw”.
-
-
class
geopyspark.geotrellis.neighborhood.
Annulus
(inner_radius, outer_radius)¶ An Annulus neighborhood.
Parameters: - inner_radius (int or float) – The radius of the inner circle.
- outer_radius (int or float) – The radius of the outer circle.
-
inner_radius
¶ int or float – The radius of the inner circle.
-
outer_radius
¶ int or float – The radius of the outer circle.
-
param_1
¶ float – Same as
inner_radius
.
-
param_2
¶ float – Same as
outer_radius
.
-
param_3
¶ float – Unused param for
Annulus
. Is 0.0.
-
name
¶ str – The name of the neighborhood which is, “annulus”.
geopyspark.geotrellis.layer module¶
This module contains the RasterLayer
and the TiledRasterLayer
classes. Both of these
classes are wrappers of their Scala counterparts. These will be used in leau of actual PySpark RDDs
when performing operations.
-
class
geopyspark.geotrellis.layer.
RasterLayer
(layer_type, srdd)¶ A wrapper of a RDD that contains GeoTrellis rasters.
Represents a layer that wraps a RDD that contains
(K, V)
. WhereK
is eitherProjectedExtent
orTemporalProjectedExtent
depending on thelayer_type
of the RDD, andV
being aTile
.The data held within this layer has not been tiled. Meaning the data has yet to be modified to fit a certain layout. See raster_rdd for more information.
Parameters: - layer_type (str or
LayerType
) – What the layer type of the geotiffs are. This is represented by either constants withinLayerType
or by a string. - srdd (py4j.java_gateway.JavaObject) – The coresponding Scala class. This is what allows
RasterLayer
to access the various Scala methods.
-
pysc
¶ pyspark.SparkContext – The
SparkContext
being used this session.
-
srdd
¶ py4j.java_gateway.JavaObject – The coresponding Scala class. This is what allows
RasterLayer
to access the various Scala methods.
-
bands
(band)¶ Select a subsection of bands from the
Tile
s within the layer.Note
There could be potential high performance cost if operations are performed between two sub-bands of a large data set.
Note
Due to the natue of GeoPySpark’s backend, if selecting a band that is out of bounds then the error returned will be a
py4j.protocol.Py4JJavaError
and not a normal Python error.Parameters: band (int or tuple or list or range) – The band(s) to be selected from the Tile
s. Can either be a single int, or a collection of ints.Returns: RasterLayer
with the selected bands.
-
cache
()¶ Persist this RDD with the default storage level (C{MEMORY_ONLY}).
-
collect_keys
()¶ Returns a list of all of the keys in the layer.
Note
This method should only be called on layers with a smaller number of keys, as a large number could cause memory issues.
Returns: [:obj:`~geopyspark.geotrellis.SpatialKey`]
or[:ob:`~geopyspark.geotrellis.SpaceTimeKey`]
-
collect_metadata
(layout=LocalLayout(tile_cols=256, tile_rows=256))¶ Iterate over the RDD records and generates layer metadata desribing the contained rasters.
- :param layout (
LayoutDefinition
or:GlobalLayout
or LocalLayout
, optional):- Target raster layout for the tiling operation.
Returns: Metadata
- :param layout (
-
convert_data_type
(new_type, no_data_value=None)¶ Converts the underlying, raster values to a new
CellType
.Parameters: - new_type (str or
CellType
) – The data type the cells should be to converted to. - no_data_value (int or float, optional) – The value that should be marked as NoData.
Returns: Raises: ValueError
– Ifno_data_value
is set and thenew_type
contains raw values.ValueError
– Ifno_data_value
is set andnew_type
is a boolean.
- new_type (str or
-
count
()¶ Returns how many elements are within the wrapped RDD.
Returns: The number of elements in the RDD. Return type: Int
-
classmethod
from_numpy_rdd
(layer_type, numpy_rdd)¶ Create a
RasterLayer
from a numpy RDD.Parameters: - layer_type (str or
LayerType
) – What the layer type of the geotiffs are. This is represented by either constants withinLayerType
or by a string. - numpy_rdd (pyspark.RDD) – A PySpark RDD that contains tuples of either
ProjectedExtent
s orTemporalProjectedExtent
s and rasters that are represented by a numpy array.
Returns: - layer_type (str or
-
getNumPartitions
()¶ Returns the number of partitions set for the wrapped RDD.
Returns: The number of partitions. Return type: Int
-
get_class_histogram
()¶ Creates a
Histogram
of integer values. Suitable for classification rasters with limited number values. If only single band is present histogram is returned directly.Returns: Histogram
or [Histogram
]
-
get_histogram
()¶ Creates a
Histogram
for each band in the layer. If only single band is present histogram is returned directly.Returns: Histogram
or [Histogram
]
-
get_min_max
()¶ Returns the maximum and minimum values of all of the rasters in the layer.
Returns: (float, float)
-
get_quantile_breaks
(num_breaks)¶ Returns quantile breaks for this Layer.
Parameters: num_breaks (int) – The number of breaks to return. Returns: [float]
-
get_quantile_breaks_exact_int
(num_breaks)¶ Returns quantile breaks for this Layer. This version uses the
FastMapHistogram
, which counts exact integer values. If your layer has too many values, this can cause memory errors.Parameters: num_breaks (int) – The number of breaks to return. Returns: [int]
-
map_cells
(func)¶ Maps over the cells of each
Tile
within the layer with a given function.Note
This operation first needs to deserialize the wrapped
RDD
into Python and then serialize theRDD
back into aTiledRasterRDD
once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.Parameters: func (cells, nd => cells) – A function that takes two arguements: cells
andnd
. Wherecells
is the numpy array andnd
is theno_data_value
of theTile
. It returnscells
which are the new cells values of theTile
represented as a numpy array.Returns: RasterLayer
-
map_tiles
(func)¶ Maps over each
Tile
within the layer with a given function.Note
This operation first needs to deserialize the wrapped
RDD
into Python and then serialize theRDD
back into aRasterRDD
once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.Parameters: func ( Tile
=>Tile
) – A function that takes aTile
and returns aTile
.Returns: RasterLayer
-
persist
(storageLevel=StorageLevel(False, True, False, False, 1))¶ Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).
-
reclassify
(value_map, data_type, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>, replace_nodata_with=None)¶ Changes the cell values of a raster based on how the data is broken up.
Parameters: - value_map (dict) – A
dict
whose keys represent values where a break should occur and its values are the new value the cells within the break should become. - data_type (type) – The type of the values within the rasters. Can either be int or float.
- classification_strategy (str or
ClassificationStrategy
, optional) – How the cells should be classified along the breaks. If unspecified, thenClassificationStrategy.LESS_THAN_OR_EQUAL_TO
will be used. - replace_nodata_with (data_type, optional) – When remapping values, nodata values must be treated separately. If nodata values are intended to be replaced during the reclassify, this variable should be set to the intended value. If unspecified, nodata values will be preserved.
Note
NoData symbolizes a different value depending on if
data_type
is int or float. For int, the constantNO_DATA_INT
can be used which represents the NoData value for int in GeoTrellis. For float,float('nan')
is used to represent NoData.Returns: RasterLayer
- value_map (dict) – A
-
reproject
(target_crs, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)¶ Reproject rasters to
target_crs
. The reproject does not sample past tile boundary.Parameters: - target_crs (str or int) – Target CRS of reprojection. Either EPSG code, well-known name, or a PROJ.4 string.
- resample_method (str or
ResampleMethod
, optional) – The resample method to use for the reprojection. If none is specified, thenResampleMethods.NEAREST_NEIGHBOR
is used.
Returns:
-
tile_to_layout
(layout=LocalLayout(tile_cols=256, tile_rows=256), target_crs=None, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)¶ Cut tiles to layout and merge overlapping tiles. This will produce unique keys.
- :param layout (
Metadata
or:TiledRasterLayer
or LayoutDefinition
orGlobalLayout
orLocalLayout
, optional):Target raster layout for the tiling operation.
Parameters: - target_crs (str or int, optional) – Target CRS of reprojection. Either EPSG code,
well-known name, or a PROJ.4 string. If
None
, no reproject will be perfomed. - resample_method (str or
ResampleMethod
, optional) – The cell resample method to used during the tiling operation. Default is``ResampleMethods.NEAREST_NEIGHBOR``.
Returns: - :param layout (
-
to_geotiff_rdd
(storage_method=<StorageMethod.STRIPED: 'Striped'>, rows_per_strip=None, tile_dimensions=(256, 256), compression=<Compression.NO_COMPRESSION: 'NoCompression'>, color_space=<ColorSpace.BLACK_IS_ZERO: 1>, color_map=None, head_tags=None, band_tags=None)¶ Converts the rasters within this layer to GeoTiffs which are then converted to bytes. This is returned as a
RDD[(K, bytes)]
. WhereK
is eitherProjectedExtent
orTemporalProjectedExtent
.Parameters: - storage_method (str or
StorageMethod
, optional) – How the segments within the GeoTiffs should be arranged. Default isStorageMethod.STRIPED
. - rows_per_strip (int, optional) – How many rows should be in each strip segment of the
GeoTiffs if
storage_method
isStorageMethod.STRIPED
. IfNone
, then the strip size will default to a value that is 8K or less. - tile_dimensions ((int, int), optional) – The length and width for each tile segment of the GeoTiff
if
storage_method
isStorageMethod.TILED
. IfNone
then the default size is(256, 256)
. - compression (str or
Compression
, optional) – How the data should be compressed. Defaults toCompression.NO_COMPRESSION
. - color_space (str or
ColorSpace
, optional) – How the colors should be organized in the GeoTiffs. Defaults toColorSpace.BLACK_IS_ZERO
. - color_map (
ColorMap
, optional) – AColorMap
instance used to color the GeoTiffs to a different gradient. - head_tags (dict, optional) – A
dict
where each key and value is astr
. - band_tags (list, optional) – A
list
ofdict
s where each key and value is astr
. - Note – For more information on the contents of the tags, see www.gdal.org/gdal_datamodel.html
Returns: RDD[(K, bytes)]
- storage_method (str or
-
to_numpy_rdd
()¶ Converts a
RasterLayer
to a numpy RDD.Note
Depending on the size of the data stored within the RDD, this can be an exspensive operation and should be used with caution.
Returns: RDD
-
to_png_rdd
(color_map)¶ Converts the rasters within this layer to PNGs which are then converted to bytes. This is returned as a RDD[(K, bytes)].
Parameters: color_map ( ColorMap
) – AColorMap
instance used to color the PNGs.Returns: RDD[(K, bytes)]
-
to_spatial_layer
(target_time=None)¶ Converts a
RasterLayer
with alayout_type
ofLayoutType.SPACETIME
to aRasterLayer
with alayout_type
ofLayoutType.SPATIAL
.Parameters: target_time ( datetime.datetime
, optional) – The instance of interest. If set, the resultingRasterLayer
will only contain keys that contained the given instance. IfNone
, then all values within the layer will be kept.Returns: RasterLayer
Raises: ValueError
– If the layer already has alayout_type
ofLayoutType.SPATIAL
.
-
unpersist
()¶ Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
-
wrapped_rdds
()¶ Returns the list of RDD-containing objects wrapped by this object. The default implementation assumes that subclass contains a single RDD container, srdd, which implements the persist() and unpersist() methods.
- layer_type (str or
-
class
geopyspark.geotrellis.layer.
TiledRasterLayer
(layer_type, srdd)¶ Wraps a RDD of tiled, GeoTrellis rasters.
Represents a RDD that contains
(K, V)
. WhereK
is eitherSpatialKey
orSpaceTimeKey
depending on thelayer_type
of the RDD, andV
being aTile
.The data held within the layer is tiled. This means that the rasters have been modified to fit a larger layout. For more information, see tiled-raster-rdd.
Parameters: - layer_type (str or
LayerType
) – What the layer type of the geotiffs are. This is represented by either constants withinLayerType
or by a string. - srdd (py4j.java_gateway.JavaObject) – The coresponding Scala class. This is what allows
TiledRasterLayer
to access the various Scala methods.
-
pysc
¶ pyspark.SparkContext – The
SparkContext
being used this session.
-
srdd
¶ py4j.java_gateway.JavaObject – The coresponding Scala class. This is what allows
RasterLayer
to access the various Scala methods.
-
is_floating_point_layer
¶ bool – Whether the data within the
TiledRasterLayer
is floating point or not.
-
zoom_level
¶ int – The zoom level of the layer. Can be
None
.
-
bands
(band)¶ Select a subsection of bands from the
Tile
s within the layer.Note
There could be potential high performance cost if operations are performed between two sub-bands of a large data set.
Note
Due to the natue of GeoPySpark’s backend, if selecting a band that is out of bounds then the error returned will be a
py4j.protocol.Py4JJavaError
and not a normal Python error.Parameters: band (int or tuple or list or range) – The band(s) to be selected from the Tile
s. Can either be a single int, or a collection of ints.Returns: TiledRasterLayer
with the selected bands.
-
cache
()¶ Persist this RDD with the default storage level (C{MEMORY_ONLY}).
-
collect_keys
()¶ Returns a list of all of the keys in the layer.
Note
This method should only be called on layers with a smaller number of keys, as a large number could cause memory issues.
Returns: [:class:`~geopyspark.geotrellis.ProjectedExtent`]
or[:class:`~geopyspark.geotrellis.TemporalProjectedExtent`]
-
convert_data_type
(new_type, no_data_value=None)¶ Converts the underlying, raster values to a new
CellType
.Parameters: - new_type (str or
CellType
) – The data type the cells should be to converted to. - no_data_value (int or float, optional) – The value that should be marked as NoData.
Returns: Raises: ValueError
– Ifno_data_value
is set and thenew_type
contains raw values.ValueError
– Ifno_data_value
is set andnew_type
is a boolean.
- new_type (str or
-
count
()¶ Returns how many elements are within the wrapped RDD.
Returns: The number of elements in the RDD. Return type: Int
-
focal
(operation, neighborhood=None, param_1=None, param_2=None, param_3=None)¶ Performs the given focal operation on the layers contained in the Layer.
Parameters: - operation (str or
Operation
) – The focal operation to be performed. - neighborhood (str or
Neighborhood
, optional) – The type of neighborhood to use in the focal operation. This can be represented by either an instance ofNeighborhood
, or by a constant. - param_1 (int or float, optional) – If using
Operation.SLOPE
, then this is the zFactor, else it is the first argument ofneighborhood
. - param_2 (int or float, optional) – The second argument of the
neighborhood
. - param_3 (int or float, optional) – The third argument of the
neighborhood
.
Note
param
only need to be set ifneighborhood
is not an instance ofNeighborhood
or ifneighborhood
isNone
.Any
param
that is not set will default to 0.0.If
neighborhood
isNone
thenoperation
must be eitherOperation.SLOPE
orOperation.ASPECT
.Returns: Raises: ValueError
– Ifoperation
is not a known operation.ValueError
– Ifneighborhood
is not a known neighborhood.ValueError
– Ifneighborhood
was not set, andoperation
is notOperation.SLOPE
orOperation.ASPECT
.
- operation (str or
-
classmethod
from_numpy_rdd
(layer_type, numpy_rdd, metadata, zoom_level=None)¶ Create a
TiledRasterLayer
from a numpy RDD.Parameters: - layer_type (str or
LayerType
) – What the layer type of the geotiffs are. This is represented by either constants withinLayerType
or by a string. - numpy_rdd (pyspark.RDD) – A PySpark RDD that contains tuples of either
SpatialKey
orSpaceTimeKey
and rasters that are represented by a numpy array. - metadata (
Metadata
) – TheMetadata
of theTiledRasterLayer
instance. - zoom_level (int, optional) – The
zoom_level
the resulting TiledRasterLayer should have. IfNone
, then the returned layer’szoom_level
will beNone
.
Returns: - layer_type (str or
-
getNumPartitions
()¶ Returns the number of partitions set for the wrapped RDD.
Returns: The number of partitions. Return type: Int
-
get_class_histogram
()¶ Creates a
Histogram
of integer values. Suitable for classification rasters with limited number values. If only single band is present histogram is returned directly.Returns: Histogram
or [Histogram
]
-
get_histogram
()¶ Creates a
Histogram
for each band in the layer. If only single band is present histogram is returned directly.Returns: Histogram
or [Histogram
]
-
get_min_max
()¶ Returns the maximum and minimum values of all of the rasters in the layer.
Returns: (float, float)
-
get_quantile_breaks
(num_breaks)¶ Returns quantile breaks for this Layer.
Parameters: num_breaks (int) – The number of breaks to return. Returns: [float]
-
get_quantile_breaks_exact_int
(num_breaks)¶ Returns quantile breaks for this Layer. This version uses the
FastMapHistogram
, which counts exact integer values. If your layer has too many values, this can cause memory errors.Parameters: num_breaks (int) – The number of breaks to return. Returns: [int]
-
lookup
(col, row)¶ Return the value(s) in the image of a particular
SpatialKey
(given by col and row).Parameters: - col (int) – The
SpatialKey
column. - row (int) – The
SpatialKey
row.
Returns: [
Tile
]Raises: ValueError
– If using lookup on a nonLayerType.SPATIAL
TiledRasterLayer
.IndexError
– If col and row are not within theTiledRasterLayer
‘s bounds.
- col (int) – The
-
map_cells
(func)¶ Maps over the cells of each
Tile
within the layer with a given function.Note
This operation first needs to deserialize the wrapped
RDD
into Python and then serialize theRDD
back into aTiledRasterRDD
once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.Parameters: func (cells, nd => cells) – A function that takes two arguements: cells
andnd
. Wherecells
is the numpy array andnd
is theno_data_value
of the tile. It returnscells
which are the new cells values of the tile represented as a numpy array.Returns: TiledRasterLayer
-
map_tiles
(func)¶ Maps over each
Tile
within the layer with a given function.Note
This operation first needs to deserialize the wrapped
RDD
into Python and then serialize theRDD
back into aTiledRasterRDD
once the mapping is done. Thus, it is advised to chain together operations to reduce performance cost.Parameters: func ( Tile
=>Tile
) – A function that takes aTile
and returns aTile
.Returns: TiledRasterLayer
-
mask
(geometries)¶ Masks the
TiledRasterLayer
so that only values that intersect the geometries will be available.Parameters: geometries (shapely.geometry or [shapely.geometry]) – Either a list of, or a single shapely geometry/ies to use for the mask/s.
Note
All geometries must be in the same CRS as the TileLayer.
Returns: TiledRasterLayer
-
normalize
(new_min, new_max, old_min=None, old_max=None)¶ Finds the min value that is contained within the given geometry.
Note
If
old_max - old_min <= 0
ornew_max - new_min <= 0
, then the normalization will fail.Parameters: - old_min (int or float, optional) – Old minimum. If not given, then the minimum value of this layer will be used.
- old_max (int or float, optional) – Old maximum. If not given, then the minimum value of this layer will be used.
- new_min (int or float) – New minimum to normalize to.
- new_max (int or float) – New maximum to normalize to.
Returns:
-
persist
(storageLevel=StorageLevel(False, True, False, False, 1))¶ Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).
-
polygonal_max
(geometry, data_type)¶ Finds the max value that is contained within the given geometry.
Parameters: - geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
Polygon
orMultiPolygon
that represents the area where the summary should be computed; or a WKB representation of the geometry. - data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns: int or float depending on
data_type
.Raises: TypeError
– Ifdata_type
is not an int or float.- geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
-
polygonal_mean
(geometry)¶ Finds the mean of all of the values that are contained within the given geometry.
Parameters: geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A Shapely Polygon
orMultiPolygon
that represents the area where the summary should be computed; or a WKB representation of the geometry.Returns: float
-
polygonal_min
(geometry, data_type)¶ Finds the min value that is contained within the given geometry.
Parameters: - geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
Polygon
orMultiPolygon
that represents the area where the summary should be computed; or a WKB representation of the geometry. - data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns: int or float depending on
data_type
.Raises: TypeError
– Ifdata_type
is not an int or float.- geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
-
polygonal_sum
(geometry, data_type)¶ Finds the sum of all of the values that are contained within the given geometry.
Parameters: - geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
Polygon
orMultiPolygon
that represents the area where the summary should be computed; or a WKB representation of the geometry. - data_type (type) – The type of the values within the rasters. Can either be int or float.
Returns: int or float depending on
data_type
.Raises: TypeError
– Ifdata_type
is not an int or float.- geometry (shapely.geometry.Polygon or shapely.geometry.MultiPolygon or bytes) – A
Shapely
-
pyramid
(resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)¶ Creates a layer
Pyramid
where the resolution is halved per level.Parameters: resample_method (str or ResampleMethod
, optional) – The resample method to use when building the pyramid. Default isResampleMethods.NEAREST_NEIGHBOR
.Returns: Pyramid
.Raises: ValueError
– If this layer layout is not ofGlobalLayout
type.
-
reclassify
(value_map, data_type, classification_strategy=<ClassificationStrategy.LESS_THAN_OR_EQUAL_TO: 'LessThanOrEqualTo'>, replace_nodata_with=None)¶ Changes the cell values of a raster based on how the data is broken up.
Parameters: - value_map (dict) – A
dict
whose keys represent values where a break should occur and its values are the new value the cells within the break should become. - data_type (type) – The type of the values within the rasters. Can either be int or float.
- classification_strategy (str or
ClassificationStrategy
, optional) – How the cells should be classified along the breaks. If unspecified, thenClassificationStrategy.LESS_THAN_OR_EQUAL_TO
will be used. - replace_nodata_with (data_type, optional) – When remapping values, nodata values must be treated separately. If nodata values are intended to be replaced during the reclassify, this variable should be set to the intended value. If unspecified, nodata values will be preserved.
Note
NoData symbolizes a different value depending on if
data_type
is int or float. For int, the constantNO_DATA_INT
can be used which represents the NoData value for int in GeoTrellis. For float,float('nan')
is used to represent NoData.Returns: TiledRasterLayer
- value_map (dict) – A
-
repartition
(num_partitions=None)¶ Repartition underlying RDD using HashPartitioner. If
num_partitions
is None, existing number of partitions will be used.Parameters: num_partitions (int, optional) – Desired number of partitions Returns: TiledRasterLayer
-
reproject
(target_crs, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)¶ Reproject rasters to
target_crs
. The reproject does not sample past tile boundary.Parameters: - target_crs (str or int) – Target CRS of reprojection. Either EPSG code, well-known name, or a PROJ.4 string.
- resample_method (str or
ResampleMethod
, optional) – The resample method to use for the reprojection. If none is specified, thenResampleMethods.NEAREST_NEIGHBOR
is used.
Returns:
-
save_stitched
(path, crop_bounds=None, crop_dimensions=None)¶ Stitch all of the rasters within the Layer into one raster and then saves it to a given path.
Parameters: - path (str) – The path of the geotiff to save. The path must be on the local file system.
- crop_bounds (
Extent
, optional) – The subExtent
with which to crop the raster before saving. IfNone
, then the whole raster will be saved. - crop_dimensions (tuple(int) or list(int), optional) – cols and rows of the image to save
represented as either a tuple or list. If
None
then all cols and rows of the raster will be save.
Note
This can only be used on
LayerType.SPATIAL
TiledRasterLayer
s.Note
If
crop_dimensions
is set thencrop_bounds
must also be set.
-
stitch
()¶ Stitch all of the rasters within the Layer into one raster.
Note
This can only be used on
LayerType.SPATIAL
TiledRasterLayer
s.Returns: Tile
-
tile_to_layout
(layout, target_crs=None, resample_method=<ResampleMethod.NEAREST_NEIGHBOR: 'NearestNeighbor'>)¶ Cut tiles to a given layout and merge overlapping tiles. This will produce unique keys.
- :param layout (
LayoutDefinition
or:Metadata
or TiledRasterLayer
orGlobalLayout
orLocalLayout
):Target raster layout for the tiling operation.
Parameters: - target_crs (str or int, optional) – Target CRS of reprojection. Either EPSG code,
well-known name, or a PROJ.4 string. If
None
, no reproject will be perfomed. - resample_method (str or
ResampleMethod
, optional) – The resample method to use for the reprojection. If none is specified, thenResampleMethods.NEAREST_NEIGHBOR
is used.
Returns: - :param layout (
-
to_geotiff_rdd
(storage_method=<StorageMethod.STRIPED: 'Striped'>, rows_per_strip=None, tile_dimensions=(256, 256), compression=<Compression.NO_COMPRESSION: 'NoCompression'>, color_space=<ColorSpace.BLACK_IS_ZERO: 1>, color_map=None, head_tags=None, band_tags=None)¶ Converts the rasters within this layer to GeoTiffs which are then converted to bytes. This is returned as a
RDD[(K, bytes)]
. WhereK
is eitherSpatialKey
orSpaceTimeKey
.Parameters: - storage_method (str or
StorageMethod
, optional) – How the segments within the GeoTiffs should be arranged. Default isStorageMethod.STRIPED
. - rows_per_strip (int, optional) – How many rows should be in each strip segment of the
GeoTiffs if
storage_method
isStorageMethod.STRIPED
. IfNone
, then the strip size will default to a value that is 8K or less. - tile_dimensions ((int, int), optional) – The length and width for each tile segment of the GeoTiff
if
storage_method
isStorageMethod.TILED
. IfNone
then the default size is(256, 256)
. - compression (str or
Compression
, optional) – How the data should be compressed. Defaults toCompression.NO_COMPRESSION
. - color_space (str or
ColorSpace
, optional) – How the colors should be organized in the GeoTiffs. Defaults toColorSpace.BLACK_IS_ZERO
. - color_map (
ColorMap
, optional) – AColorMap
instance used to color the GeoTiffs to a different gradient. - head_tags (dict, optional) – A
dict
where each key and value is astr
. - band_tags (list, optional) – A
list
ofdict
s where each key and value is astr
. - Note – For more information on the contents of the tags, see www.gdal.org/gdal_datamodel.html
Returns: RDD[(K, bytes)]
- storage_method (str or
-
to_numpy_rdd
()¶ Converts a
TiledRasterLayer
to a numpy RDD.Note
Depending on the size of the data stored within the RDD, this can be an exspensive operation and should be used with caution.
Returns: RDD
-
to_png_rdd
(color_map)¶ Converts the rasters within this layer to PNGs which are then converted to bytes. This is returned as a RDD[(K, bytes)].
Parameters: color_map ( ColorMap
) – AColorMap
instance used to color the PNGs.Returns: RDD[(K, bytes)]
-
to_spatial_layer
(target_time=None)¶ Converts a
TiledRasterLayer
with alayout_type
ofLayoutType.SPACETIME
to aTiledRasterLayer
with alayout_type
ofLayoutType.SPATIAL
.Parameters: target_time ( datetime.datetime
, optional) – The instance of interest. If set, the resultingTiledRasterLayer
will only contain keys that contained the given instance. IfNone
, then all values within the layer will be kept.Returns: TiledRasterLayer
Raises: ValueError
– If the layer already has alayout_type
ofLayoutType.SPATIAL
.
-
unpersist
()¶ Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
-
wrapped_rdds
()¶ Returns the list of RDD-containing objects wrapped by this object. The default implementation assumes that subclass contains a single RDD container, srdd, which implements the persist() and unpersist() methods.
- layer_type (str or
-
class
geopyspark.geotrellis.layer.
Pyramid
(levels)¶ Contains a list of
TiledRasterLayer
s that make up a tile pyramid. Each layer represents a level within the pyramid. This class is used when creating a tile server.Map algebra can performed on instances of this class.
Parameters: levels (list or dict) – A list of TiledRasterLayer
s or a dict ofTiledRasterLayer
s where the value is the layer itself and the key is its given zoom level.-
pysc
¶ pyspark.SparkContext – The
SparkContext
being used this session.
-
layer_type (class
~geopyspark.geotrellis.constants.LayerType): What the layer type of the geotiffs are.
-
levels
¶ dict – A dict of
TiledRasterLayer
s where the value is the layer itself and the key is its given zoom level.
-
max_zoom
¶ int – The highest zoom level of the pyramid.
-
is_cached
¶ bool – Signals whether or not the internal RDDs are cached. Default is
False
.
-
histogram
¶ Histogram
– TheHistogram
that represents the layer with the max zoomw. Will not be calculated unless theget_histogram()
method is used. Otherwise, its value isNone
.
Raises: TypeError
– Iflevels
is neither a list or dict.-
cache
()¶ Persist this RDD with the default storage level (C{MEMORY_ONLY}).
-
count
()¶ Returns how many elements are within the wrapped RDD.
Returns: The number of elements in the RDD. Return type: Int
-
getNumPartitions
()¶ Returns the number of partitions set for the wrapped RDD.
Returns: The number of partitions. Return type: Int
-
persist
(storageLevel=StorageLevel(False, True, False, False, 1))¶ Set this RDD’s storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. If no storage level is specified defaults to (C{MEMORY_ONLY}).
-
unpersist
()¶ Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
-
wrapped_rdds
()¶ Returns a list of the wrapped, Scala RDDs within each layer of the pyramid.
Returns: [org.apache.spark.rdd.RDD]
-
geopyspark.geotrellis.cost_distance module¶
-
geopyspark.geotrellis.cost_distance.
cost_distance
(friction_layer, geometries, max_distance)¶ Performs cost distance of a TileLayer.
Parameters: - friction_layer (
TiledRasterLayer
) –TiledRasterLayer
of a friction surface to traverse. - geometries (list) –
A list of shapely geometries to be used as a starting point.
Note
All geometries must be in the same CRS as the TileLayer.
- max_distance (int or float) – The maximum cost that a path may reach before the operation.
stops. This value can be an
int
orfloat
.
Returns: - friction_layer (
geopyspark.geotrellis.euclidean_distance module¶
-
geopyspark.geotrellis.euclidean_distance.
euclidean_distance
(geometry, source_crs, zoom, cell_type=<CellType.FLOAT64: 'float64'>)¶ Calculates the Euclidean distance of a Shapely geometry.
Parameters: - geometry (shapely.geometry) – The input geometry to compute the Euclidean distance for.
- source_crs (str or int) – The CRS of the input geometry.
- zoom (int) – The zoom level of the output raster.
- cell_type (str or
CellType
, optional) – The data type of the cells for the new layer. If not specified, thenCellType.FLOAT64
is used.
Note
This function may run very slowly for polygonal inputs if they cover many cells of the output raster.
Returns: TiledRasterLayer
geopyspark.geotrellis.hillshade module¶
-
geopyspark.geotrellis.hillshade.
hillshade
(tiled_raster_layer, band=0, azimuth=315.0, altitude=45.0, z_factor=1.0)¶ Computes Hillshade (shaded relief) from a raster.
The resulting raster will be a shaded relief map (a hill shading) based on the sun altitude, azimuth, and the z factor. The z factor is a conversion factor from map units to elevation units.
Returns a raster of ShortConstantNoDataCellType.
For descriptions of parameters, please see Esri Desktop’s description of Hillshade.
Parameters: - tiled_raster_layer (
TiledRasterLayer
) – The base layer that contains the rasters used to compute the hillshade. - band (int, optional) – The band of the raster to base the hillshade calculation on. Default is 0.
- azimuth (float, optional) – The azimuth angle of the source of light. Default value is 315.0.
- altitude (float, optional) – The angle of the altitude of the light above the horizon. Default is 45.0.
- z_factor (float, optional) – How many x and y units in a single z unit. Default value is 1.0.
Returns: - tiled_raster_layer (
geopyspark.geotrellis.rasterize module¶
-
geopyspark.geotrellis.rasterize.
rasterize
(geoms, crs, zoom, fill_value, cell_type=<CellType.FLOAT64: 'float64'>, options=None, num_partitions=None)¶ Rasterizes a Shapely geometries.
Parameters: - geoms ([shapely.geometry]) – List of shapely geometries to rasterize.
- crs (str or int) – The CRS of the input geometry.
- zoom (int) – The zoom level of the output raster.
- fill_value (int or float) – Value to burn into pixels intersectiong geometry
- cell_type (str or
CellType
) – Which data type the cells should be when created. Defaults toCellType.FLOAT64
. - options (
RasterizerOptions
, optional) – Pixel intersection options. - num_partitions (int, optional) – The number of repartitions Spark will make when the data is
repartitioned. If
None
, then the data will not be repartitioned.
Returns:
geopyspark.geotrellis.histogram module¶
This module contains the Histogram
class which is a wrapper of the GeoTrellis Histogram
class.
-
class
geopyspark.geotrellis.histogram.
Histogram
(scala_histogram)¶ A wrapper class for a GeoTrellis Histogram.
The underlying histogram is produced from the values within a
TiledRasterLayer
. These values represented by the histogram can either beInt
orFloat
depending on the data type of the cells in the layer.Parameters: scala_histogram (py4j.JavaObject) – An instance of the GeoTrellis histogram. -
scala_histogram
¶ py4j.JavaObject – An instance of the GeoTrellis histogram.
-
bin_counts
()¶ Returns a list of tuples where the key is the bin label value and the value is the label’s respective count.
Returns: [(int, int)] or [(float, int)]
-
bucket_count
()¶ Returns the number of buckets within the histogram.
Returns: int
-
cdf
()¶ Returns the cdf of the distribution of the histogram.
Returns: [(float, float)]
-
classmethod
from_dict
(value)¶ Encodes histogram as a dictionary
-
item_count
(item)¶ Returns the total number of times a given item appears in the histogram.
Parameters: item (int or float) – The value whose occurences should be counted. Returns: The total count of the occurences of item
in the histogram.Return type: int
-
max
()¶ The largest value of the histogram.
This will return either an
int
orfloat
depedning on the type of values within the histogram.Returns: int or float
-
mean
()¶ Determines the mean of the histogram.
Returns: float
-
median
()¶ Determines the median of the histogram.
Returns: float
-
merge
(other_histogram)¶ Merges this instance of
Histogram
with another. The resultingHistogram
will contain values from both ``Histogram``sParameters: other_histogram ( Histogram
) – TheHistogram
that should be merged with this instance.Returns: Histogram
-
min
()¶ The smallest value of the histogram.
This will return either an
int
orfloat
depedning on the type of values within the histogram.Returns: int or float
-
min_max
()¶ The largest and smallest values of the histogram.
This will return either an
int
orfloat
depedning on the type of values within the histogram.Returns: (int, int) or (float, float)
-
mode
()¶ Determines the mode of the histogram.
This will return either an
int
orfloat
depedning on the type of values within the histogram.Returns: int or float
-
quantile_breaks
(num_breaks)¶ Returns quantile breaks for this Layer.
Parameters: num_breaks (int) – The number of breaks to return. Returns: [int]
-
to_dict
()¶ Encodes histogram as a dictionary
Returns: dict
-