PostgreSQL having fun with Python
Do now ask why, pretend it is more about having fun rather than something useful and of course it is not about running that in production
In my case I have remote data living in 3rd party service, I managed to get it into Google Sheets, but it is not so easy to do some usual stuff like join, union there
So I was curious how it may look like in Postgres
Demo is done in docker so it is not persistant in any way
Start the Postgres
docker run -it --rm --name=pg -e POSTGRES_PASSWORD='P@ssword' -e POSTGRES_USER='demo' -e POSTGRES_DB='demo' -e PYTHONPATH=/var/lib/postgres/ExternalDataSource -p 5432:5432 postgres:16
Install python language extensions
docker exec -it pg apt update
docker exec -it pg apt install -y python3 postgresql-plpython3-16
Enable and check lagnuage
docker exec -it pg psql -U demo demo -c 'SELECT * FROM pg_language'
docker exec -it pg psql -U demo demo -c 'CREATE LANGUAGE plpython3u'
docker exec -it pg psql -U demo demo -c 'SELECT * FROM pg_language'
Jump into psql
docker exec -it pg psql -U demo demo
Create postgrges python hello world function
CREATE OR REPLACE FUNCTION hello(name text) RETURNS text AS
$BODY$
return 'Hello ' + name
$BODY$
LANGUAGE plpython3u;
and test it:
select hello('Alex');
hello
------------
Hello Alex
(1 row)
And now is the fun part
CREATE OR REPLACE FUNCTION get_remote_data() RETURNS TABLE (user_id int, id int, title text) AS $$
import json
from urllib.request import urlopen
url = 'https://jsonplaceholder.typicode.com/posts'
response = urlopen(url)
if response.getcode() == 200:
data = json.loads(response.read().decode('utf-8'))
for post in data:
yield (post['userId'], post['id'], post['title'])
else:
raise Exception('Error fetching data')
$$ LANGUAGE plpython3u;
And its usage
SELECT * FROM get_remote_data() LIMIT 3;
user_id | id | title
---------+----+----------------------------------------------------------------------------
1 | 1 | sunt aut facere repellat provident occaecati excepturi optio reprehenderit
1 | 2 | qui est esse
1 | 3 | ea molestias quasi exercitationem repellat qui ipsa sit aut
(3 rows)
SELECT COUNT(*) FROM get_remote_data();
count
-------
100
(1 row)
3rd party libraries are also accessible, aka:
docker exec -it pg apt install -y python3-requests
CREATE OR REPLACE FUNCTION get_remote_data2() RETURNS TABLE (user_id int, id int, title text) AS $$
import requests
r = requests.get('https://jsonplaceholder.typicode.com/posts')
for post in r.json():
yield (post['userId'], post['id'], post['title'])
$$ LANGUAGE plpython3u;
With this in play I can query external data source as if it was simple table
And even if I still want Google Sheets I always may expose query results as some king of csv and get it with IMPORTDATA function
Notes:
- in SQL Server the same and even more is doable via embedding compiled dll with dotnet which may allow to do even more crazy things
- there is so called Foreign Data Wrapper (FDW) concept in postgres with huge amount of ready to use extensions to query data from external sources - aka make sql query on top of elasticsearch
- there is multicorn2 fdw that supossed to run python script, but not sure if that is needed
- for cases when performance is critical, theoretically, such calls may cache themselves in postgres but at the very end for performance critical things it probably should be external tool to sync data, in my case it is fine even if request will run for few seconds