Mongo Spark Connector Python

Python Spark 链接 MongoDB-新乡seo|网站优化,网站建设_微信公众号:zeropython—昊天博客

Prerequisites

Have MongoDB up and running and Spark 2.2.x downloaded. See the introduction and the SQL
for more information on getting started.

You can run the interactive pyspark shell like so:

The Python API Basics

The python API works via DataFrames and uses underlying Scala DataFrame.

DataFrames and Datasets

Creating a dataframe is easy you can either load the data via DefaultSource ("com.mongodb.spark.sql.DefaultSource").

First, in an empty collection we load the following data:

Then to load the characters into a DataFrame via the standard source method:

Will return:

Alternatively, you can specify the database and collection while reading the dataframe:

And to write a dataframe to a collection:

SQL

Just like the Scala examples, SQL can be used to filter data. In the following example we register a temp table and then filter and output
the characters with ages under 100:

Outputs: