geopyspark.vector_pipe.osm_reader module¶
-
geopyspark.vector_pipe.osm_reader.
from_orc
(source)¶ Reads in OSM data from an orc file that is located either locally or on S3. The resulting data will be read in as an instance of
FeaturesCollection
.Parameters: source (str) – The path or URI to the orc file to be read. Can either be a local file, or a file on S3.
Note
Reading a file from S3 requires additional setup depending on the environment and how the file is being read.
The following describes the parameters that need to be set depending on how the files are to be read in. However, if reading a file on EMR, then the access key and secret key do not need to be set.
- If using
s3a://
, then the followingSparkConf
parameters need to be set: spark.hadoop.fs.s3a.impl
spark.hadoop.fs.s3a.access.key
spark.hadoop.fs.s3a.secret.key
- If using
s3n://
, then the followingSparkConf
parameters need to be set: spark.hadoop.fs.s3n.access.key
spark.hadoop.fs.s3n.secret.key
An alternative to passing in your S3 credentials to
SparkConf
would be to export them as environment variables:AWS_ACCESS_KEY_ID=YOUR_KEY
AWS_SECRET_ACCESS_KEY_ID=YOUR_SECRET_KEY
Returns: FeaturesCollection
- If using
-
geopyspark.vector_pipe.osm_reader.
from_dataframe
(dataframe)¶ Reads OSM data from a Spark
DataFrame
. The resulting data will be read in as an instance ofFeaturesCollection
.Parameters: dataframe (DataFrame) – A Spark DataFrame
that contains the OSM data.Returns: FeaturesCollection