geopyspark package

geopyspark

class geopyspark.geopycontext.AvroRegistry

Holds the encoding/decoding methods needed to bring a scala RDD to/from Python.

classmethod create_partial_tuple_decoder(key_type=None, value_type=None)

Creates a partial, tuple decoder function.

Parameters:
  • key_type (str, optional) – The type of the key in the tuple.
  • value_type (str, optional) – The type of the value in the tuple.
Returns:

A partial tuple_decoder function that requires a schema_dict to execute.

classmethod create_partial_tuple_encoder(key_type=None, value_type=None)

Creates a partial, tuple encoder function.

Parameters:
  • key_type (str, optional) – The type of the key in the tuple.
  • value_type (str, optional) – The type of the value in the tuple.
Returns:

A partial tuple_encoder function that requires a obj to execute.

classmethod tile_decoder(schema_dict)

Decodes a TILE into Python.

Parameters:schema_dict (dict) – The dict representation of the AvroSchema.
Returns:Tile
classmethod tile_encoder(obj)

Encodes a TILE to send to Scala.

Parameters:obj (dict) – The dict representation of TILE.
Returns:avro_schema_dict (dict)
static tuple_decoder(schema_dict, key_decoder=None, value_decoder=None)

Decodes a tuple into Python.

Parameters:
  • schema_dict (dict) – The dict representation of the AvroSchema.
  • key_decoder (func, optional) – The decoding function of the key.
  • value_decoder (func, optional) – The decoding function fo the value.
Returns:

tuple

static tuple_encoder(obj, key_encoder=None, value_encoder=None)

Encodes a tuple to send to Scala.

Parameters:
  • obj (tuple) – The tuple to be encoded.
  • key_encoder (func, optional) – The encoding function of the key.
  • value_encoder (func, optional) – The encoding function fo the value.
Returns:

avro_schema_dict (dict)

class geopyspark.geopycontext.AvroSerializer(schema, decoding_method=None, encoding_method=None)

The serializer used by a RDD to encode/decode values to/from Python.

Parameters:
  • schema (str) – The AvroSchema of the RDD.
  • decoding_method (func, optional) – The decocding function for the values within the RDD.
  • encoding_method (func, optional) – The encocding function for the values within the RDD.
schema

str – The AvroSchema of the RDD.

decoding_method

func, optional – The decocding function for the values within the RDD.

encoding_method

func, optional – The encocding function for the values within the RDD.

dumps(obj)

Serialize an object into a byte array.

Note

When batching is used, this will be called with a list of objects.

Parameters:obj – The object to serialized into a byte array.
Returns:The byte array representation of the obj.
loads(obj)

Deserializes a byte array into a collection of Python objects.

Parameters:obj – The byte array representation of an object to be deserialized into the object.
Returns:A list of deserialized objects.
schema_dict

The schema values in a dict.

class geopyspark.geopycontext.GeoPyContext(pysc=None, **kwargs)

A wrapper of SparkContext. This wrapper provides extra functionality by providing methods that help with sending/recieving information to/from python.

Parameters:
  • pysc (pypspark.SparkContext, optional) – An existing SparkContext.
  • **kwargsGeoPyContext can create a SparkContext if given its constructing arguments.

Note

If both pysc and kwargs are set the pysc will be used.

pysc

pyspark.SparkContext – The wrapped SparkContext.

sc

org.apache.spark.SparkContext – The scala SparkContext derived from the python one.

Raises:TypeError – If neither a SparkContext or its constructing arguments are given.

Examples

Creating GeoPyContext from an existing SparkContext.

>>> sc = SparkContext(appName="example", master="local[*]")
>>> SparkContext
>>> geopysc = GeoPyContext(sc)
>>> GeoPyContext

Creating GeoPyContext from the constructing arguments of SparkContext.

>>> geopysc = GeoPyContext(appName="example", master="local[*]")
>>> GeoPyContext
create_python_rdd(jrdd, serializer)

Creates a Python RDD from a RDD from Scala.

Parameters:
  • jrdd (org.apache.spark.api.java.JavaRDD) – The RDD that came from Scala.
  • serializer (AvroSerializer or pyspark.serializers.AutoBatchedSerializer(AvroSerializer)) – An instance of AvroSerializer that is either alone, or wrapped by AutoBatchedSerializer.
Returns:

pyspark.RDD

create_schema(key_type)

Creates an AvroSchema.

Parameters:key_type (str) – The type of the K in the tuple, (K, V) in the RDD.
Returns:An AvroSchema for the types within the RDD.
static map_key_input(key_type, is_boundable)

Gets the mapped GeoTrellis type from the key_type.

Parameters:
  • key_type (str) – The type of the K in the tuple, (K, V) in the RDD.
  • is_boundable (bool) – Is K boundable.
Returns:

The corresponding GeoTrellis type.