Ingesting an Image

This example shows how to ingest a grayscale image and save the results locally. It is assumed that you have already read through the documentation on GeoPySpark before beginning this tutorial.

Getting the Data

Before we can begin with the ingest, we must first download the data from S3. This curl command will download a file from S3 and save it to your /tmp direcotry. The file being downloaded comes from the Shuttle Radar Topography Mission (SRTM) dataset, and contains elevation data on the east coast of Sri Lanka.

A side note: Files can be retrieved directly from S3 using the methods shown in this tutorial. However, this could not be done in this instance due to permission requirements needed to access the file.

curl -o /tmp/cropped.tif https://s3.amazonaws.com/geopyspark-test/example-files/cropped.tif

What is an Ingest?

Before continuing on, it would be best to briefly discuss what an ingest actually is. When data is acquired, it may cover an arbitrary spatial extent in an arbitrary projection. This data needs to be regularized to some expected layout and cut into tiles. After this step, we will possess a TiledRasterLayer that can be analyzed and saved for later use. For more information on layers and the data they hold, see the layers guide.

The Code

With our file downloaded we can begin the ingest.

import geopyspark as gps

from pyspark import SparkContext

Setting Up the SparkContext

The first thing one needs to do when using GeoPySpark is to setup SparkContext. Because GeoPySpark is backed by Spark, the pysc is needed to initialize our starting classes.

For those that are already familiar with Spark, you may already know there are multiple ways to create a SparkContext. When working with GeoPySpark, it is advised to create this instance via SparkConf. There are numerous settings for SparkConf, and some have to be set a certain way in order for GeoPySpark to work. Thus, geopyspark_conf was created as way for a user to set the basic parameters without having to worry about setting the other, required fields.

conf = gps.geopyspark_conf(master="local[*]", appName="ingest-example")
pysc = SparkContext(conf=conf)

Reading in the Data

After the creation of pysc, we can now read in the data. For this example, we will be reading in a single GeoTiff that contains spatial data. Hence, why we set the layer_type to LayerType.SPATIAL.

raster_layer = gps.geotiff.get(layer_type=gps.LayerType.SPATIAL, uri="file:///tmp/cropped.tif")

Tiling the Data

It is now time to format the data within the layer to our desired layout. The aptly named, tile_to_layout, method will cut and arrange the rasters in the layer to the layout of our choosing. This results in us getting a new class instance of TiledRasterLayer. For this example, we will be tiling to a GlobalLayout.

With our tiled data, we might like to make a tile server from it and show it in on a map at some point. Therefore, we have to make sure that the tiles within the layer are in the right projection. We can do this by setting the target_crs parameter.

tiled_raster_layer = raster_layer.tile_to_layout(gps.GlobalLayout(), target_crs=3857)
tiled_raster_layer

Pyramiding the Data

Now it’s time to pyramid! With our reprojected data, we will create an instance of Pyramid that contains 12 TiledRasterLayers. Each one having it’s own zoom_level from 11 to 0.

pyramided_layer = tiled_raster_layer.pyramid()
pyramided_layer.max_zoom
pyramided_layer.levels

Saving the Pyramid Locally

To save all of the TiledRasterLayers within pyramid_layer, we just have to loop through values of pyramid_layer.level and write each layer locally.

for tiled_layer in pyramided_layer.levels.values():
    gps.write(uri="file:///tmp/ingested-image", layer_name="ingested-image", tiled_raster_layer=tiled_layer)