Show
added a comment - - edited Here is a summary what I have seen so far with a standard SQLAlchemy `execute()` method. I tried several variations: single row vs. multi-row inserts, and using RETURNING with engines that support it. I tested it on a simple table with few columns and integer primary key column. Script is here: https://gist.github.com/andy-slac/7e1e473dd49ced02dd844fda7c835e8e
Both Oracle and Postgres support implicit_returning option (enabled by default) which adds RETURNING to single-row INSERTs to return value of the added primary key. It does nothing for multi-row inserts.
Oracle backend
To support autoincrement Oracle needs separate SEQUENCE defined, which needs to be given to a PK Column constructor: https://docs.sqlalchemy.org/en/13/dialects/oracle.html#auto-increment-behavior
Implicit RETURNING, single-row insert:
generated SQL: INSERT INTO test_table_ret (id, val1, val2) VALUES (test_table_ret_id_seq.nextval, :val1, :val2) RETURNING test_table_ret.id INTO :ret_0
params: { 'val1': 1, 'val2': 10, 'ret_0': <cx_Oracle.STRING with value [None, ..., None] > }
result.rowcount: 1
result.inserted_primary_key: list of one integer, inserted ID
no rows returned in the result
Explicit RETURNING, single row insert:
SQL: INSERT INTO test_table_ret (id, val1, val2) VALUES (test_table_ret_id_seq.nextval, :val1, :val2) RETURNING test_table_ret.id INTO :ret_0
params: { 'val1': 1, 'val2': 10, 'ret_0': <cx_Oracle.STRING with value [None, ..., None] > }
result.rowcount: 1
result.inserted_primary_key: Not defined
result has single row with the value of ID inserted
Implicit RETURNING, multi-row insert:
SQL: INSERT INTO test_table_ret (id, val1, val2) VALUES (test_table_ret_id_seq.nextval, :val1, :val2)
params: [{'val1': 1, 'val2': 10}, {'val1': 2, 'val2': 20}, {'val1': 3, 'val2': 30}]
result.rowcount: 3
result.inserted_primary_key: Not defined
no rows returned in the result
Explicit RETURNING, multi-row insert:
SQL: INSERT INTO test_table_ret (id, val1, val2) VALUES (test_table_ret_id_seq.nextval, :val1, :val2) RETURNING test_table_ret.id INTO :ret_0
params: [{'val1': 1, 'val2': 10, 'ret_0': None}, {'val1': 2, 'val2': 20, 'ret_0': None}, {'val1': 3, 'val2': 30, 'ret_0': None}]
result.rowcount: 3
result.inserted_primary_key: Not defined
no rows returned in the result
Postgres backend
Like Oracle Postgres needs a sequence to generate auto-increment values, Postgres can define sequence itself if PK column has special type (SERIAL/BIGSERIAL) but using explicit sequence a la Oracle works too: https://docs.sqlalchemy.org/en/13/dialects/postgresql.html#sequences-serial-identity
Implicit RETURNING, single-row insert:
generated SQL: INSERT INTO test_table_ret (id, val1, val2) VALUES (nextval('test_table_ret_id_seq'), %(val1)s, %(val2)s) RETURNING test_table_ret.id
params: {'val1': 1, 'val2': 10}
result.rowcount: 1
result.inserted_primary_key: list of one integer, inserted ID
result.lastrowid: 0 (probably meaningless)
no rows returned in the result
Explicit RETURNING, single row insert:
generated SQL: INSERT INTO test_table_ret (id, val1, val2) VALUES (nextval('test_table_ret_id_seq'), %(val1)s, %(val2)s) RETURNING test_table_ret.id
params: {'val1': 1, 'val2': 10}
result.rowcount: 1
result.inserted_primary_key: Not defined
result.lastrowid: 0 (probably meaningless)
result has single row with the value of ID inserted
Implicit RETURNING, multi-row insert:
SQL: INSERT INTO test_table_ret (id, val1, val2) VALUES (nextval('test_table_ret_id_seq'), %(val1)s, %(val2)s)
params: ({'val1': 1, 'val2': 10}, {'val1': 2, 'val2': 20}, {'val1': 3, 'val2': 30})
result.rowcount: 3
result.inserted_primary_key: Not defined
result.lastrowid: 0 (probably meaningless)
no rows returned in the result
Explicit RETURNING, multi-row insert:
SQL: INSERT INTO test_table_ret (id, val1, val2) VALUES (nextval('test_table_ret_id_seq'), %(val1)s, %(val2)s) RETURNING test_table_ret.id
params: ({'val1': 1, 'val2': 10}, {'val1': 2, 'val2': 20}, {'val1': 3, 'val2': 30})
result.rowcount: 3
result.inserted_primary_key: Not defined
result.lastrowid: 0 (probably meaningless)
no rows returned in the result
SQLite backend
SQLite has no support for RETURNING.
single-row insert:
generated SQL: INSERT INTO test_table_ret (val1, val2) VALUES (?, ?)
params: (1, 10)
result.rowcount: 1
result.inserted_primary_key: list of one integer, inserted ID
result.lastrowid: number (same as in inserted_primary_key)
no rows returned in the result
multi-row insert:
SQL: INSERT INTO test_table_ret (val1, val2) VALUES (?, ?)
params: ((1, 10), (2, 20), (3, 30))
result.rowcount: 3
result.inserted_primary_key: Not defined
result.lastrowid: None
no rows returned in the result
Summary I guess is that:
supporting auto-increment feature is not hard, just need a Sequence defined for Oracle and optionally for Postgres (I think it can even be shared between multiple tables)
returning generated PK value by default only works for single-row insert, there is no special anything needed, result.inserted_primary_key has the value
multi-row insert has no standard way of returning values to client for any of the backends, there may be backend-specific ways to do that though, will look at it next.
General comment for autoincrement feature - its implementation is backend-specific and is done very differently in sqlalchemy for every separate backend:
Relying on autoincrement with bulk inserts is somewhat tricky - one would have to retrieve IDs of the inserted rows and associate them with client-side data. Oracle and Postgres support INSERT ... RETURNING syntax so it is possible to specify what has to be returned from INSERT. SQLAlchemy also provides inserted_primary_key and lastrowid attributes for ResultSet which work for a single-row insert (that should be OK for us as we do not really use "true" bulk insert syntax). Combining multi-row insert and conflict resolution means that number of rows actually inserted can be different from number of rows supplied, and associating two may not be trivial at all. I think the only way out of that is to insert one row at a time and check whether it was inserted OK and find its primary key.
Just to mention another approach - we could use explicit sequence calls to obtain a set of IDs and then insert new rows with the known IDs (for SQLite we'd have simulate a sequence). Drawback here is that we'll need multiple calls a sequence to generate multiple IDs and each call is basically a separate query and separate round trip to a server so effectively it means doubling the number of queries sent to a server.
I want to experiment a little bit with all three backends to check for any non-trivial issues.