PostgreSQL having fun with Python

Do now ask why, pretend it is more about having fun rather than something useful and of course it is not about running that in production

In my case I have remote data living in 3rd party service, I managed to get it into Google Sheets, but it is not so easy to do some usual stuff like join, union there

So I was curious how it may look like in Postgres

Demo is done in docker so it is not persistant in any way

Start the Postgres

docker run -it --rm --name=pg -e POSTGRES_PASSWORD='P@ssword' -e POSTGRES_USER='demo' -e POSTGRES_DB='demo' -e PYTHONPATH=/var/lib/postgres/ExternalDataSource -p 5432:5432 postgres:16

Install python language extensions

docker exec -it pg apt update
docker exec -it pg apt install -y python3 postgresql-plpython3-16

Enable and check lagnuage

docker exec -it pg psql -U demo demo -c 'SELECT * FROM pg_language'
docker exec -it pg psql -U demo demo -c 'CREATE LANGUAGE plpython3u'
docker exec -it pg psql -U demo demo -c 'SELECT * FROM pg_language'

Jump into psql

docker exec -it pg psql -U demo demo

Create postgrges python hello world function

CREATE OR REPLACE FUNCTION hello(name text) RETURNS text AS
$BODY$
return 'Hello ' + name
$BODY$
LANGUAGE plpython3u;

and test it:

select hello('Alex');
   hello
------------
 Hello Alex
(1 row)

And now is the fun part

CREATE OR REPLACE FUNCTION get_remote_data() RETURNS TABLE (user_id int, id int, title text) AS $$
import json
from urllib.request import urlopen

url = 'https://jsonplaceholder.typicode.com/posts'
response = urlopen(url)

if response.getcode() == 200:
    data = json.loads(response.read().decode('utf-8'))
    for post in data:
        yield (post['userId'], post['id'], post['title'])
else:
    raise Exception('Error fetching data')
$$ LANGUAGE plpython3u;

And its usage

SELECT * FROM get_remote_data() LIMIT 3;
 user_id | id |                                   title
---------+----+----------------------------------------------------------------------------
       1 |  1 | sunt aut facere repellat provident occaecati excepturi optio reprehenderit
       1 |  2 | qui est esse
       1 |  3 | ea molestias quasi exercitationem repellat qui ipsa sit aut
(3 rows)
SELECT COUNT(*) FROM get_remote_data();
 count
-------
   100
(1 row)

3rd party libraries are also accessible, aka:

docker exec -it pg apt install -y python3-requests
CREATE OR REPLACE FUNCTION get_remote_data2() RETURNS TABLE (user_id int, id int, title text) AS $$
import requests
r = requests.get('https://jsonplaceholder.typicode.com/posts')
for post in r.json():
      yield (post['userId'], post['id'], post['title'])
$$ LANGUAGE plpython3u;

With this in play I can query external data source as if it was simple table

And even if I still want Google Sheets I always may expose query results as some king of csv and get it with IMPORTDATA function

Notes:

  • in SQL Server the same and even more is doable via embedding compiled dll with dotnet which may allow to do even more crazy things
  • there is so called Foreign Data Wrapper (FDW) concept in postgres with huge amount of ready to use extensions to query data from external sources - aka make sql query on top of elasticsearch
  • there is multicorn2 fdw that supossed to run python script, but not sure if that is needed
  • for cases when performance is critical, theoretically, such calls may cache themselves in postgres but at the very end for performance critical things it probably should be external tool to sync data, in my case it is fine even if request will run for few seconds