geopyspark package¶
geopyspark¶
-
class
geopyspark.geopycontext.AvroRegistry¶ Holds the encoding/decoding methods needed to bring a scala RDD to/from Python.
-
classmethod
create_partial_tuple_decoder(key_type=None, value_type=None)¶ Creates a partial, tuple decoder function.
Parameters: - key_type (str, optional) – The type of the key in the tuple.
- value_type (str, optional) – The type of the value in the tuple.
Returns: A partial tuple_decoder function that requires a schema_dict to execute.
-
classmethod
create_partial_tuple_encoder(key_type=None, value_type=None)¶ Creates a partial, tuple encoder function.
Parameters: - key_type (str, optional) – The type of the key in the tuple.
- value_type (str, optional) – The type of the value in the tuple.
Returns: A partial tuple_encoder function that requires a obj to execute.
-
classmethod
tile_decoder(schema_dict)¶ Decodes a
TILEinto Python.Parameters: schema_dict (dict) – The dict representation of the AvroSchema. Returns: Tile
-
classmethod
tile_encoder(obj)¶ Encodes a
TILEto send to Scala.Parameters: obj (dict) – The dictrepresentation ofTILE.Returns: avro_schema_dict ( dict)
-
static
tuple_decoder(schema_dict, key_decoder=None, value_decoder=None)¶ Decodes a tuple into Python.
Parameters: - schema_dict (dict) – The
dictrepresentation of the AvroSchema. - key_decoder (func, optional) – The decoding function of the key.
- value_decoder (func, optional) – The decoding function fo the value.
Returns: tuple
- schema_dict (dict) – The
-
static
tuple_encoder(obj, key_encoder=None, value_encoder=None)¶ Encodes a tuple to send to Scala.
Parameters: - obj (tuple) – The tuple to be encoded.
- key_encoder (func, optional) – The encoding function of the key.
- value_encoder (func, optional) – The encoding function fo the value.
Returns: avro_schema_dict (
dict)
-
classmethod
-
class
geopyspark.geopycontext.AvroSerializer(schema, decoding_method=None, encoding_method=None)¶ The serializer used by a RDD to encode/decode values to/from Python.
Parameters: - schema (str) – The AvroSchema of the RDD.
- decoding_method (func, optional) – The decocding function for the values within the RDD.
- encoding_method (func, optional) – The encocding function for the values within the RDD.
-
schema¶ str – The AvroSchema of the RDD.
-
decoding_method¶ func, optional – The decocding function for the values within the RDD.
-
encoding_method¶ func, optional – The encocding function for the values within the RDD.
-
dumps(obj)¶ Serialize an object into a byte array.
Note
When batching is used, this will be called with a list of objects.
Parameters: obj – The object to serialized into a byte array. Returns: The byte array representation of the obj.
-
loads(obj)¶ Deserializes a byte array into a collection of Python objects.
Parameters: obj – The byte array representation of an object to be deserialized into the object. Returns: A list of deserialized objects.
-
schema_dict¶ The schema values in a dict.
-
class
geopyspark.geopycontext.GeoPyContext(pysc=None, **kwargs)¶ A wrapper of
SparkContext. This wrapper provides extra functionality by providing methods that help with sending/recieving information to/from python.Parameters: - pysc (pypspark.SparkContext, optional) – An existing
SparkContext. - **kwargs –
GeoPyContextcan create aSparkContextif given its constructing arguments.
Note
If both
pyscandkwargsare set thepyscwill be used.-
pysc¶ pyspark.SparkContext – The wrapped
SparkContext.
-
sc¶ org.apache.spark.SparkContext – The scala
SparkContextderived from the python one.
Raises: TypeError– If neither aSparkContextor its constructing arguments are given.Examples
Creating
GeoPyContextfrom an existingSparkContext.>>> sc = SparkContext(appName="example", master="local[*]") >>> SparkContext >>> geopysc = GeoPyContext(sc) >>> GeoPyContext
Creating
GeoPyContextfrom the constructing arguments ofSparkContext.>>> geopysc = GeoPyContext(appName="example", master="local[*]") >>> GeoPyContext
-
create_python_rdd(jrdd, serializer)¶ Creates a Python RDD from a RDD from Scala.
Parameters: - jrdd (org.apache.spark.api.java.JavaRDD) – The RDD that came from Scala.
- serializer (
AvroSerializeror pyspark.serializers.AutoBatchedSerializer(AvroSerializer)) – An instance ofAvroSerializerthat is either alone, or wrapped byAutoBatchedSerializer.
Returns: pyspark.RDD
-
create_schema(key_type)¶ Creates an AvroSchema.
Parameters: key_type (str) – The type of the Kin the tuple,(K, V)in the RDD.Returns: An AvroSchema for the types within the RDD.
-
static
map_key_input(key_type, is_boundable)¶ Gets the mapped GeoTrellis type from the key_type.
Parameters: - key_type (str) – The type of the
Kin the tuple,(K, V)in the RDD. - is_boundable (bool) – Is
Kboundable.
Returns: The corresponding GeoTrellis type.
- key_type (str) – The type of the
- pysc (pypspark.SparkContext, optional) – An existing