geopyspark package¶

geopyspark¶

class geopyspark.geopycontext.AvroRegistry¶

Holds the encoding/decoding methods needed to bring a scala RDD to/from Python.

classmethod create_partial_tuple_decoder(key_type=None, value_type=None)¶

Creates a partial, tuple decoder function.

Parameters:	key_type (str, optional) – The type of the key in the tuple. value_type (str, optional) – The type of the value in the tuple.
Returns:	A partial tuple_decoder function that requires a schema_dict to execute.

classmethod create_partial_tuple_encoder(key_type=None, value_type=None)¶

Creates a partial, tuple encoder function.

Parameters:	key_type (str, optional) – The type of the key in the tuple. value_type (str, optional) – The type of the value in the tuple.
Returns:	A partial tuple_encoder function that requires a obj to execute.

classmethod tile_decoder(schema_dict)¶

Decodes a TILE into Python.

Parameters:	schema_dict (dict) – The dict representation of the AvroSchema.
Returns:	Tile

classmethod tile_encoder(obj)¶

Encodes a TILE to send to Scala.

Parameters:	obj (dict) – The `dict` representation of `TILE`.
Returns:	avro_schema_dict (`dict`)

static tuple_decoder(schema_dict, key_decoder=None, value_decoder=None)¶

Decodes a tuple into Python.

Parameters:	schema_dict (dict) – The `dict` representation of the AvroSchema. key_decoder (func, optional) – The decoding function of the key. value_decoder (func, optional) – The decoding function fo the value.
Returns:	tuple

static tuple_encoder(obj, key_encoder=None, value_encoder=None)¶

Encodes a tuple to send to Scala.

Parameters:	obj (tuple) – The tuple to be encoded. key_encoder (func, optional) – The encoding function of the key. value_encoder (func, optional) – The encoding function fo the value.
Returns:	avro_schema_dict (`dict`)

class geopyspark.geopycontext.AvroSerializer(schema, decoding_method=None, encoding_method=None)¶

The serializer used by a RDD to encode/decode values to/from Python.

Parameters:	schema (str) – The AvroSchema of the RDD. decoding_method (func, optional) – The decocding function for the values within the RDD. encoding_method (func, optional) – The encocding function for the values within the RDD.

schema¶: str – The AvroSchema of the RDD.

decoding_method¶: func, optional – The decocding function for the values within the RDD.

encoding_method¶: func, optional – The encocding function for the values within the RDD.

dumps(obj)¶

Serialize an object into a byte array.

Note

When batching is used, this will be called with a list of objects.

Parameters:	obj – The object to serialized into a byte array.
Returns:	The byte array representation of the `obj`.

loads(obj)¶

Deserializes a byte array into a collection of Python objects.

Parameters:	obj – The byte array representation of an object to be deserialized into the object.
Returns:	A list of deserialized objects.

schema_dict¶: The schema values in a dict.

class geopyspark.geopycontext.GeoPyContext(pysc=None, **kwargs)¶

A wrapper of SparkContext. This wrapper provides extra functionality by providing methods that help with sending/recieving information to/from python.

Parameters:	pysc (pypspark.SparkContext, optional) – An existing `SparkContext`. **kwargs – `GeoPyContext` can create a `SparkContext` if given its constructing arguments.

Note

If both pysc and kwargs are set the pysc will be used.

pysc¶: pyspark.SparkContext – The wrapped SparkContext.

sc¶: org.apache.spark.SparkContext – The scala SparkContext derived from the python one.

Raises:	`TypeError` – If neither a `SparkContext` or its constructing arguments are given.

Examples

Creating GeoPyContext from an existing SparkContext.

>>> sc = SparkContext(appName="example", master="local[*]")
>>> SparkContext
>>> geopysc = GeoPyContext(sc)
>>> GeoPyContext

Creating GeoPyContext from the constructing arguments of SparkContext.

>>> geopysc = GeoPyContext(appName="example", master="local[*]")
>>> GeoPyContext

create_python_rdd(jrdd, serializer)¶

Creates a Python RDD from a RDD from Scala.

Parameters:	jrdd (org.apache.spark.api.java.JavaRDD) – The RDD that came from Scala. serializer (`AvroSerializer` or pyspark.serializers.AutoBatchedSerializer(AvroSerializer)) – An instance of `AvroSerializer` that is either alone, or wrapped by `AutoBatchedSerializer`.
Returns:	`pyspark.RDD`

create_schema(key_type)¶

Creates an AvroSchema.

Parameters:	key_type (str) – The type of the `K` in the tuple, `(K, V)` in the RDD.
Returns:	An AvroSchema for the types within the RDD.

static map_key_input(key_type, is_boundable)¶

Gets the mapped GeoTrellis type from the key_type.

Parameters:	key_type (str) – The type of the `K` in the tuple, `(K, V)` in the RDD. is_boundable (bool) – Is `K` boundable.
Returns:	The corresponding GeoTrellis type.