Ingesting a Grayscale Image

This example shows how to ingest a grayscale image and save the resultsr locally.

The Code

Here’s the code to run the ingest.

from geopyspark.geopycontext import GeoPyContext
from geopyspark.geotrellis.constants import SPATIAL, ZOOM
from geopyspark.geotrellis.catalog import write
from geopyspark.geotrellis.geotiff_rdd import get

geopysc = GeoPyContext(appName="python-ingest", master="local[*]")

# Read the GeoTiff from S3
rdd = get(geopysc, SPATIAL, "file:///tmp/cropped.tif")

metadata = rdd.collect_metadata()

# tile the rdd to the layout defined in the metadata
laid_out = rdd.tile_to_layout(metadata)

# reproject the tiled rasters using a ZoomedLayoutScheme
reprojected = laid_out.reproject("EPSG:3857", scheme=ZOOM)

# pyramid the TiledRasterRDD to create 12 new TiledRasterRDDs
# one for each zoom level
pyramided = reprojected.pyramid(start_zoom=12, end_zoom=1)

# Save each TiledRasterRDDs locally
for tiled in pyramided:
    write("file:///tmp/python-catalog", "python-ingest", tiled)

Running the Code

Before you can run this example, the example file will have to be downloaded. Run this command to save the file locally in the /tmp directory.

curl -o /tmp/cropped.tif https://s3.amazonaws.com/geopyspark-test/example-files/cropped.tif

Running the code is simple, and you have two different ways of doing it.

The first is to copy and paste the code into a console like, iPython, and then running it.

The second is to place this code in a Python file and then saving it. To run it from the file, go to the directory the file is in and run this command

python3 file.py

Just replace file.py with whatever name you decided to call the file.

Breaking Down the Code

Now that the code has been written let’s go through it step-by-step to see what’s actually going on.

Reading in the Data

geopysc = GeoPyContext(appName="python-ingest", master="local[*]")

# Read the GeoTiff from S3
rdd = get(geopysc, SPATIAL, "s3:///tmp/cropped.tif")

Before doing anything when using GeoPySpark, it’s best to create a GeoPyContext instance. This acts as a wrapper for SparkContext, and provides some useful, behind-the-scenes methods for other GeoPySpark functions.

After the creation of geopysc we can now read the data. For this example, we will be reading a single GeoTiff that contains only spatial data (hence SPATIAL). This will create an instance of RasterRDD which will allow us to start working with our data.

Collecting the Metadata

metadata = rdd.collect_metadata()

Before we can begin formatting the data to our desired layout, we must first collect the Metadata of the entire RDD. The metadata itself will contain the TileLayout that the data will be formatted to. There are various ways to collect the metadata depending on how you want the layout to look (see collect_metadata()), but for this example, we will just go with the default parameters.

Tiling the Data

# tile the rdd to the layout defined in the metadata
laid_out = rdd.tile_to_layout(metadata)

# reproject the tiled rasters using a ZoomedLayoutScheme
reprojected = laid_out.reproject("EPSG:3857", scheme=ZOOM)

With the metadata collected, it is now time to format the data within the RDD to our desired layout. The aptly named, tile_to_layout(), method will cut and arrange the rasters in the RDD to the layout within the metadata; giving us a new class instance of TiledRasterRDD.

Having this new class will allow us to perform the final steps of our ingest. While the tiles are now in the correct layout, their CRS is not what we want. It would be great if we could make a tile server from our ingested data, but to do that we’ll have to change the projection. reproject() will be able to help with this. If you wish to pyramid your data, it must have a ``scheme`` of ``ZOOM`` before the pyramiding takes place. Read more about why here.

Pyramiding the Data

# pyramid the TiledRasterRDD to create 12 new TiledRasterRDD
# one for each zoom level
pyramided = reprojected.pyramid(start_zoom=12, end_zoom=1)

Now it’s time to pyramid! Using our reprojected data, we can create 12 new instances of TiledRasterRDD. Each instance represents the data within the RDD at a specific zoom level. Note: The start_zoom is always the larger number when pyramiding.

Saving the Ingest Locally

# Save each TiledRasterRDD locally
for tiled in pyramided:
    write("file:///tmp/python-catalog", "python-ingest", tiled)

All that’s left to do now is to save it. Since pyramided is just a list of TiledRasterRDD, we can just loop through it and save each element one at a time.