# Core Concepts¶

Because GeoPySpark is a binding of an existing project, GeoTrellis, some terminology and data representations have carried over. This section seeks to explain this jargon in addition to describing how GeoTrellis types are represented in GeoPySpark.

Before begining, all examples in this guide need the following boilerplate code:

```import datetime
import numpy as np
import geopyspark as gps
from shapely.geometry import box
```

## Rasters¶

GeoPySpark differs in how it represents rasters from other geo-spatial Python libraries like rasterIO. In GeoPySpark, they are represented by the `Tile` class. This class contains a numpy array (refered to as `cells`) that represents the cells of the raster in addition to other information regarding the data. Along with `cells`, `Tile` can also have the `no_data_value` of the raster.

Note: All rasters in GeoPySpark are represented as having multiple bands, even if the original raster just contained one.

```arr = np.array([[[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2]]], dtype=np.int16)

# The resulting Tile will set -10 as the no_data_value for the raster
gps.Tile.from_numpy_array(numpy_array=arr, no_data_value=-10)
```
```# The resulting Tile will have no no_data_value
gps.Tile.from_numpy_array(numpy_array=arr)
```

## Extent¶

Describes the area on Earth a raster represents. This area is represented by coordinates that are in some Coordinate Reference System. Thus, depending on the system in use, the values that outline the `Extent` can vary. `Extent` can also be refered to as a bounding box.

Note: The values within the `Extent` must be `float`s and not `double`s.

```extent = gps.Extent(0.0, 0.0, 10.0, 10.0)
extent
```

## ProjectedExtent¶

`ProjectedExtent` describes both the area on Earth a raster represents in addition to its CRS. Either the EPSG code or a proj4 string can be used to indicate the CRS of the `ProjectedExtent`.

```# Using an EPSG code

gps.ProjectedExtent(extent=extent, epsg=3857)
```
```# Using a Proj4 String

proj4 = "+proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +a=6378137 +b=6378137 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs "
gps.ProjectedExtent(extent=extent, proj4=proj4)
```

## TemporalProjectedExtent¶

Similar to `ProjectedExtent`, `TemporalProjectedExtent` describes the area on Earth the raster represents, its CRS, and the time the data was represents. This point of time, called `instant`, is an instance of `datetime.datetime`.

```time = datetime.datetime.now()
gps.TemporalProjectedExtent(extent=extent, instant=time, epsg=3857)
```

## TileLayout¶

`TileLayout` describes the grid which represents how rasters are orginized and assorted in a layer. `layoutCols` and `layoutRows` detail how many columns and rows the grid itself has, respectively. While `tileCols` and `tileRows` tell how many columns and rows each individual raster has.

```# Describes a layer where there are four rasters in a 2x2 grid. Each raster has 256 cols and rows.

tile_layout = gps.TileLayout(layoutCols=2, layoutRows=2, tileCols=256, tileRows=256)
tile_layout
```

## LayoutDefinition¶

`LayoutDefinition` describes both how the rasters are orginized in a layer as well as the area covered by the grid.

```layout_definition = gps.LayoutDefinition(extent=extent, tileLayout=tile_layout)
layout_definition
```

## Tiling Strategies¶

It is often the case that the exact layout of the layer is unknown. Rather than having to go through the effort of trying to figure out the optimal layout, there exists two different tiling strategies that will produce a layout based on the data they are given.

### LocalLayout¶

`LocalLayout` is the first tiling strategy that produces a layout where the grid is constructed over all of the pixels within a layer of a given tile size. The resulting layout will match the original resolution of the cells within the rasters.

Note: This layout cannot be used for creating display layers. Rather, it is best used for layers where operations and analysis will be performed.

```# Creates a LocalLayout where each tile within the grid will be 256x256 pixels.
gps.LocalLayout()
```
```# Creates a LocalLayout where each tile within the grid will be 512x512 pixels.
gps.LocalLayout(tile_size=512)
```
```# Creates a LocalLayout where each tile within the grid will be 256x512 pixels.
gps.LocalLayout(tile_cols=256, tile_rows=512)
```

### GlobalLayout¶

The other tiling strategy is `GlobalLayout` which makes a layout where the grid is constructed over the global extent CRS. The cell resolution of the resulting layer be multiplied by a power of 2 for the CRS. Thus, using this strategy will result in either up or down sampling of the original raster.

Note: This layout strategy should be used when the resulting layer is to be dispalyed in a TMS server.

```# Creates a GobalLayout instance with the default values
gps.GlobalLayout()
```
```# Creates a GlobalLayout instance for a zoom of 12
gps.GlobalLayout(zoom=12)
```

You may have noticed from the above two examples that `GlobalLayout` does not create layout for a given zoom level by default. Rather, it determines what the zoom should be based on the size of the cells within the rasters. If you do want to create a layout for a specific zoom level, then the `zoom` parameter must be set.

## SpatialKey¶

`SpatialKey`s describe the positions of rasters within the grid of the layout. This grid is a 2D plane where the location of a raster is represented by a pair of coordinates, `col` and `row`, respectively. As its name and attributes suggest, `SpatialKey` deals solely with spatial data.

```gps.SpatialKey(col=0, row=0)
```

## SpaceTimeKey¶

Like `SpatialKey`s, `SpaceTimeKey`s describe the position of a raster in a layout. However, the grid is a 3D plane where a location of a raster is represented by a pair of coordinates, `col` and `row`, as well as a z value that represents a point in time called, `instant`. Like the `instant` in `TemporalProjectedExtent`, this is also an instance of `datetime.datetime`. Thus, `SpaceTimeKey`s deal with spatial-temporal data.

```gps.SpaceTimeKey(col=0, row=0, instant=time)
```

## Bounds¶

`Bounds` represents the the extent of the layout grid in terms of keys. It has both a `minKey` and a `maxKey` attributes. These can either be a `SpatialKey` or a `SpaceTimeKey` depending on the type of data within the layer. The `minKey` is the left, uppermost cell in the grid and the `maxKey` is the right, bottommost cell.

```# Creating a Bounds from SpatialKeys

min_spatial_key = gps.SpatialKey(0, 0)
max_spatial_key = gps.SpatialKey(10, 10)

bounds = gps.Bounds(min_spatial_key, max_spatial_key)
bounds
```
```# Creating a Bounds from SpaceTimeKeys

min_space_time_key = gps.SpaceTimeKey(0, 0, 1.0)
max_space_time_key = gps.SpaceTimeKey(10, 10, 1.0)

gps.Bounds(min_space_time_key, max_space_time_key)
```

## Metadata¶

`Metadata` contains information of the values within a layer. This data pertains to the layout, projection, and extent of the data contained within the layer.

The below example shows how to construct `Metadata` by hand, however, this is almost never required and `Metadata` can be produced using easier means. For `RasterLayer`, one can call the method, `collect_metadata()` and `TiledRasterLayer` has the attribute, `layer_metadata`.

```# Creates Metadata for a layer with rasters that have a cell type of int16 with the previously defined
# bounds, crs, extent, and layout definition.
gps.Metadata(bounds=bounds,
crs=proj4,
cell_type=gps.CellType.INT16.value,
extent=extent,
layout_definition=layout_definition)
```

## Features¶

`Feature`s are shapely geometries with some kind of associated metadata. The main purpose of this type is to provide a means of rasterizing multiple geometries that have different cell values and who may overlap.

### CellValue¶

While in practice a `Feature` can have any kind of metadata, `CellValue` needs to be used with `Feature`s that are to be rasterized. `CellValue` has two parameters: `value` and `zindex`. `value` is the value of all the cells that intersect the `Feature`’s geometry. `zindex` of a `Feature` determines what value a cell will be if more than one geometry intersects it. The higher the `zindex`, the more priority it has.

```geom1 = box(0, 0, 15, 15)
geom2 = box(100, 26, 109, 208)
geom3 = box(610, 215, 1000, 500)

cell_value1 = gps.CellValue(value=1, zindex=1)
cell_value2 = gps.CellValue(value=2, zindex=2)
cell_value3 = gps.CellValue(value=3, zindex=3)

# Will not be selected if feature2 and/or feature3 also intersects target cell
feature1 = gps.Feature(geometry=geom1, properties=cell_value1)

# Will not be selected if feature3 also intersects target cell
feature2 = gps.Feature(geometry=geom2, properties=cell_value2)

# Will always be selected
feature3 = gps.Feature(geometry=geom3, properties=cell_value3)
```