SQL Select Validator

class ConstraintValidator

This validator checks different of a SQL query. You are intended to derive this class and implement its methods.

abstract allowed_joins() Sequence[JoinCondition]

Returns all of the tables allowed to be connected to the query via a JOIN and the equi-join conditions that must be met for the join to be valid.

Returns:

The sequence of allowed joins.

Return type:

Sequence[JoinCondition]

can_use_function(function: str) bool

Returns whether or not a SQL function is allowed to be used anywhere in the query. By default, this checks the function against the list of safe functions that we have curated by hand.

Parameters:

function (str) – The lowercase name of the function.

Returns:

Whether or not the function is allowed.

Return type:

bool

condition_column_allowed(fq_column: FqColumn) bool

Checks if a column is allowed to be used in a WHERE, JOIN, HAVING, or ORDER BY. By default, this calls select_column_allowed(), but if you override this method and want to preserve that behavior, you should call yourself.

Parameters:

fq_column (FqColumn) – The fully-qualified column.

Returns:

Whether or not the column is allowed to be used in a condition.

Return type:

bool

abstract max_limit() int | None

Return the maximum number of rows that can be returned by a query. if None, there is no limit.

This value is also used to inform 🧩 Reconstruction. If this function provides a limit, but the query does not, or the query provides a higher limit, the query will be reconstructed to include the correct limit.

Returns:

The maximum number of rows that can be returned by a query, or None if unlimited.

Return type:

int | None

abstract parameterized_constraints() Sequence[ParameterizedConstraint]

Returns a sequence of constraints that must exist in either the WHERE clause of the query or in a JOIN condition. It doesn’t matter where the constraint is, as long as it exists and is required (i.e. not part of an optional condition).

Returns:

The sequence of required constraints.

Return type:

Sequence[ParameterizedConstraint]

abstract requester_identities() Sequence[ParameterizedConstraint]

Returns the possible identities of the requester, as represented in the database. This is used to instruct the LLM how to constrain the query that it generates. Only one of these identities needs to match for the query to be compliant.

The reason that we return a sequence, and not a single identity, is that sometimes an LLM will specify the constraint as part of a JOIN condition, and not a WHERE condition. In that case, the column in the JOIN condition may not match the column you expect.

For example, consider selecting films for a customer, constrained by the customer id. The LLM may give you a query like this:

SELECT f.title
FROM film f
JOIN inventory i ON f.film_id=i.film_id
JOIN rental r ON i.inventory_id=r.inventory_id
JOIN customer c ON r.customer_id=c.customer_id
WHERE c.customer_id=:customer_id

Or you may receive a query like this:

SELECT f.title
FROM film f
JOIN inventory i ON f.film_id=i.film_id
JOIN rental r
    ON i.inventory_id=r.inventory_id
    AND r.customer_id=:customer_id

Both rental.customer_id and customer.customer_id are valid requester identities, so ou need to specify both of them by returning a heimdallm.bifrosts.sql.common.ParameterizedConstraint for each of them.

Returns:

The sequence of possible requester identities.

Return type:

Sequence[ParameterizedConstraint]

abstract select_column_allowed(column: FqColumn) bool

Check that a fully-qualified column is allowed to be selected in the SELECT clause. Use this to restrict the columns and tables that can be selected.

This value is also used to inform 🧩 Reconstruction. Columns that do not pass this check will be removed from the query.

Parameters:

column (FqColumn) – The fully-qualified column.

Returns:

Whether or not the column is allowed to be selected.

Return type:

bool