geopyspark.vector_pipe.osm_reader module

geopyspark.vector_pipe.osm_reader.from_orc(source)

Reads in OSM data from an orc file that is located either locally or on S3. The resulting data will be read in as an instance of FeaturesCollection.

Parameters:source (str) –

The path or URI to the orc file to be read. Can either be a local file, or a file on S3.

Note

Reading a file from S3 requires additional setup depending on the environment and how the file is being read.

The following describes the parameters that need to be set depending on how the files are to be read in. However, if reading a file on EMR, then the access key and secret key do not need to be set.

If using s3a://, then the following SparkConf parameters need to be set:
  • spark.hadoop.fs.s3a.impl
  • spark.hadoop.fs.s3a.access.key
  • spark.hadoop.fs.s3a.secret.key
If using s3n://, then the following SparkConf parameters need to be set:
  • spark.hadoop.fs.s3n.access.key
  • spark.hadoop.fs.s3n.secret.key

An alternative to passing in your S3 credentials to SparkConf would be to export them as environment variables:

  • AWS_ACCESS_KEY_ID=YOUR_KEY
  • AWS_SECRET_ACCESS_KEY_ID=YOUR_SECRET_KEY
Returns:FeaturesCollection
geopyspark.vector_pipe.osm_reader.from_dataframe(dataframe)

Reads OSM data from a Spark DataFrame. The resulting data will be read in as an instance of FeaturesCollection.

Parameters:dataframe (DataFrame) – A Spark DataFrame that contains the OSM data.
Returns:FeaturesCollection