Skip to content

Conversation

@miguelgrinberg
Copy link
Collaborator

@miguelgrinbergmiguelgrinberg commented Aug 15, 2024

This change adds support for semantic text, introduced in Elasticsearch 8.15, plus an example application. Unfortunately running this application requires a somewhat beefy ES instance, so I'm not going to add integration tests.


name: str
summary: str
content: Any = dsl.mapped_field(
Copy link
CollaboratorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm typing the semantic text field as Any because it has different types during ingest and search. On ingest it is a plain string, while on search it is returned as an object with the original string in the text attribute. The object also includes an inference attribute with the autogenerated chunks and their embeddings.

Copy link
Member

@pquentinpquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM.



async def search(query: str) -> dsl.AsyncSearch[WorkplaceDoc]:
return WorkplaceDoc.search()[:5].query(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Doing [:5] before the query call is a bit confusing IMO.


async def search(query: str) -> dsl.AsyncSearch[WorkplaceDoc]:
return WorkplaceDoc.search()[:5].query(
"semantic",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use the class here? One advantage is that it avoids typos.

Copy link
CollaboratorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason at all, except that historically it has been preferred to use the names. I actually prefer the classes myself and started using them on my tests. I'll update this.

@miguelgrinbergmiguelgrinberg added the backport 8.x Backport to 8.x label Aug 19, 2024
@miguelgrinbergmiguelgrinberg merged commit 7fa4f8c into elastic:mainAug 19, 2024
@miguelgrinbergmiguelgrinberg deleted the semantic-text-support branch August 19, 2024 10:44
github-actionsbot pushed a commit that referenced this pull request Aug 19, 2024
…#1881) * Added support for the `semantic_text` field and `semantic` query type * Fix nltk code... again * feedback (cherry picked from commit 7fa4f8c)
miguelgrinberg added a commit that referenced this pull request Aug 19, 2024
…#1881) (#1882) * Added support for the `semantic_text` field and `semantic` query type * Fix nltk code... again * feedback (cherry picked from commit 7fa4f8c) Co-authored-by: Miguel Grinberg <[email protected]>
miguelgrinberg added a commit to miguelgrinberg/elasticsearch-dsl-py that referenced this pull request Dec 9, 2024
…elastic#1881) * Added support for the `semantic_text` field and `semantic` query type * Fix nltk code... again * feedback
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 8.xBackport to 8.x

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

@miguelgrinberg@pquentin