Pyspark — Parallel read from database

How to leverage spark to read in parallel from a database

Spark Parallelization
import osq = '(select min(id) as min, max(id) as max from table_name where condition) as bounds'
db_url = 'localhost:5342'
partitions = os.cpu_count() * 2 # a good starting point
conn_properties = {
'user': 'username',
'password': 'password',
'driver': 'org.postgresql.Driver', # assuming we have Postgres
# given that we partition our data by id, get the minimum and the maximum id:
bounds =

