geopyspark package¶
geopyspark¶
-
class
geopyspark.geopycontext.
AvroRegistry
¶ Holds the encoding/decoding methods needed to bring a scala RDD to/from Python.
-
classmethod
create_partial_tuple_decoder
(key_type=None, value_type=None)¶ Creates a partial, tuple decoder function.
Parameters: - key_type (str, optional) – The type of the key in the tuple.
- value_type (str, optional) – The type of the value in the tuple.
Returns: A partial tuple_decoder function that requires a schema_dict to execute.
-
classmethod
create_partial_tuple_encoder
(key_type=None, value_type=None)¶ Creates a partial, tuple encoder function.
Parameters: - key_type (str, optional) – The type of the key in the tuple.
- value_type (str, optional) – The type of the value in the tuple.
Returns: A partial tuple_encoder function that requires a obj to execute.
-
classmethod
tile_decoder
(schema_dict)¶ Decodes a
TILE
into Python.Parameters: schema_dict (dict) – The dict representation of the AvroSchema. Returns: Tile
-
classmethod
tile_encoder
(obj)¶ Encodes a
TILE
to send to Scala.Parameters: obj (dict) – The dict
representation ofTILE
.Returns: avro_schema_dict ( dict
)
-
static
tuple_decoder
(schema_dict, key_decoder=None, value_decoder=None)¶ Decodes a tuple into Python.
Parameters: - schema_dict (dict) – The
dict
representation of the AvroSchema. - key_decoder (func, optional) – The decoding function of the key.
- value_decoder (func, optional) – The decoding function fo the value.
Returns: tuple
- schema_dict (dict) – The
-
static
tuple_encoder
(obj, key_encoder=None, value_encoder=None)¶ Encodes a tuple to send to Scala.
Parameters: - obj (tuple) – The tuple to be encoded.
- key_encoder (func, optional) – The encoding function of the key.
- value_encoder (func, optional) – The encoding function fo the value.
Returns: avro_schema_dict (
dict
)
-
classmethod
-
class
geopyspark.geopycontext.
AvroSerializer
(schema, decoding_method=None, encoding_method=None)¶ The serializer used by a RDD to encode/decode values to/from Python.
Parameters: - schema (str) – The AvroSchema of the RDD.
- decoding_method (func, optional) – The decocding function for the values within the RDD.
- encoding_method (func, optional) – The encocding function for the values within the RDD.
-
schema
¶ str – The AvroSchema of the RDD.
-
decoding_method
¶ func, optional – The decocding function for the values within the RDD.
-
encoding_method
¶ func, optional – The encocding function for the values within the RDD.
-
dumps
(obj)¶ Serialize an object into a byte array.
Note
When batching is used, this will be called with a list of objects.
Parameters: obj – The object to serialized into a byte array. Returns: The byte array representation of the obj
.
-
loads
(obj)¶ Deserializes a byte array into a collection of Python objects.
Parameters: obj – The byte array representation of an object to be deserialized into the object. Returns: A list of deserialized objects.
-
schema_dict
¶ The schema values in a dict.
-
class
geopyspark.geopycontext.
GeoPyContext
(pysc=None, **kwargs)¶ A wrapper of
SparkContext
. This wrapper provides extra functionality by providing methods that help with sending/recieving information to/from python.Parameters: - pysc (pypspark.SparkContext, optional) – An existing
SparkContext
. - **kwargs –
GeoPyContext
can create aSparkContext
if given its constructing arguments.
Note
If both
pysc
andkwargs
are set thepysc
will be used.-
pysc
¶ pyspark.SparkContext – The wrapped
SparkContext
.
-
sc
¶ org.apache.spark.SparkContext – The scala
SparkContext
derived from the python one.
Raises: TypeError
– If neither aSparkContext
or its constructing arguments are given.Examples
Creating
GeoPyContext
from an existingSparkContext
.>>> sc = SparkContext(appName="example", master="local[*]") >>> SparkContext >>> geopysc = GeoPyContext(sc) >>> GeoPyContext
Creating
GeoPyContext
from the constructing arguments ofSparkContext
.>>> geopysc = GeoPyContext(appName="example", master="local[*]") >>> GeoPyContext
-
create_python_rdd
(jrdd, serializer)¶ Creates a Python RDD from a RDD from Scala.
Parameters: - jrdd (org.apache.spark.api.java.JavaRDD) – The RDD that came from Scala.
- serializer (
AvroSerializer
or pyspark.serializers.AutoBatchedSerializer(AvroSerializer)) – An instance ofAvroSerializer
that is either alone, or wrapped byAutoBatchedSerializer
.
Returns: pyspark.RDD
-
create_schema
(key_type)¶ Creates an AvroSchema.
Parameters: key_type (str) – The type of the K
in the tuple,(K, V)
in the RDD.Returns: An AvroSchema for the types within the RDD.
-
static
map_key_input
(key_type, is_boundable)¶ Gets the mapped GeoTrellis type from the key_type.
Parameters: - key_type (str) – The type of the
K
in the tuple,(K, V)
in the RDD. - is_boundable (bool) – Is
K
boundable.
Returns: The corresponding GeoTrellis type.
- key_type (str) – The type of the
- pysc (pypspark.SparkContext, optional) – An existing