Skip to content

pgvector support for Java, Kotlin, Groovy, and Scala

License

Notifications You must be signed in to change notification settings

pgvector/pgvector-java

Repository files navigation

pgvector-java

pgvector support for Java, Kotlin, Groovy, and Scala

Supports JDBC, Spring JDBC, Groovy SQL, and Slick

Build Status

Getting Started

For Maven, add to pom.xml under <dependencies>:

<dependency> <groupId>com.pgvector</groupId> <artifactId>pgvector</artifactId> <version>0.1.6</version> </dependency>

For sbt, add to build.sbt:

libraryDependencies +="com.pgvector"%"pgvector"%"0.1.6"

For other build tools, see this page.

And follow the instructions for your database library:

Or check out some examples:

JDBC (Java)

Import the PGvector class

importcom.pgvector.PGvector;

Enable the extension

StatementsetupStmt = conn.createStatement(); setupStmt.executeUpdate("CREATE EXTENSION IF NOT EXISTS vector");

Register the types with your connection

PGvector.registerTypes(conn);

Create a table

StatementcreateStmt = conn.createStatement(); createStmt.executeUpdate("CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))");

Insert a vector

PreparedStatementinsertStmt = conn.prepareStatement("INSERT INTO items (embedding) VALUES (?)"); insertStmt.setObject(1, newPGvector(newfloat[]{1, 1, 1})); insertStmt.executeUpdate();

Get the nearest neighbors

PreparedStatementneighborStmt = conn.prepareStatement("SELECT * FROM items ORDER BY embedding <-> ? LIMIT 5"); neighborStmt.setObject(1, newPGvector(newfloat[]{1, 1, 1})); ResultSetrs = neighborStmt.executeQuery(); while (rs.next()){System.out.println((PGvector) rs.getObject("embedding"))}

Add an approximate index

StatementindexStmt = conn.createStatement(); indexStmt.executeUpdate("CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)"); // orindexStmt.executeUpdate("CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)");

Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance

See a full example

Spring JDBC

Import the PGvector class

importcom.pgvector.PGvector;

Enable the extension

jdbcTemplate.execute("CREATE EXTENSION IF NOT EXISTS vector");

Create a table

jdbcTemplate.execute("CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))");

Insert a vector

Object[] insertParams = newObject[]{newPGvector(newfloat[]{1, 1, 1}) }; jdbcTemplate.update("INSERT INTO items (embedding) VALUES (?)", insertParams);

Get the nearest neighbors

Object[] neighborParams = newObject[]{newPGvector(newfloat[]{1, 1, 1}) }; List<Map<String, Object>> rows = jdbcTemplate.queryForList("SELECT * FROM items ORDER BY embedding <-> ? LIMIT 5", neighborParams); for (Maprow : rows){System.out.println(row.get("embedding"))}

Add an approximate index

jdbcTemplate.execute("CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)"); // orjdbcTemplate.execute("CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)");

Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance

See a full example

Hibernate

Hibernate 6.4+ has a vector module (use this instead of com.pgvector.pgvector).

For Maven, add to pom.xml under <dependencies>:

<dependency> <groupId>org.hibernate.orm</groupId> <artifactId>hibernate-vector</artifactId> <version>6.4.0.Final</version> </dependency>

Define an entity

importjakarta.persistence.*; importorg.hibernate.annotations.Array; importorg.hibernate.annotations.JdbcTypeCode; importorg.hibernate.type.SqlTypes; @EntityclassItem{@Id@GeneratedValueprivateLongid; @Column@JdbcTypeCode(SqlTypes.VECTOR) @Array(length = 3) // dimensionsprivatefloat[] embedding; publicvoidsetEmbedding(float[] embedding){this.embedding = embedding} }

Insert a vector

Itemitem = newItem(); item.setEmbedding(newfloat[]{1, 1, 1}); entityManager.persist(item);

Get the nearest neighbors

List<Item> items = entityManager .createQuery("FROM Item ORDER BY l2_distance(embedding, :embedding) LIMIT 5", Item.class) .setParameter("embedding", newfloat[]{1, 1, 1}) .getResultList();

See a full example

R2DBC

R2DBC PostgreSQL 1.0.3+ supports the vector type (use this instead of com.pgvector.pgvector).

For Maven, add to pom.xml under <dependencies>:

<dependency> <groupId>org.postgresql</groupId> <artifactId>r2dbc-postgresql</artifactId> <version>1.0.3.RELEASE</version> </dependency>

JDBC (Kotlin)

Import the PGvector class

importcom.pgvector.PGvector

Enable the extension

val setupStmt = conn.createStatement() setupStmt.executeUpdate("CREATE EXTENSION IF NOT EXISTS vector")

Register the types with your connection

PGvector.registerTypes(conn)

Create a table

val createStmt = conn.createStatement() createStmt.executeUpdate("CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))")

Insert a vector

val insertStmt = conn.prepareStatement("INSERT INTO items (embedding) VALUES (?)") insertStmt.setObject(1, PGvector(floatArrayOf(1.0f, 1.0f, 1.0f))) insertStmt.executeUpdate()

Get the nearest neighbors

val neighborStmt = conn.prepareStatement("SELECT * FROM items ORDER BY embedding <-> ? LIMIT 5") neighborStmt.setObject(1, PGvector(floatArrayOf(1.0f, 1.0f, 1.0f))) val rs = neighborStmt.executeQuery() while (rs.next()){println(rs.getObject("embedding") asPGvector?) }

Add an approximate index

val indexStmt = conn.createStatement() indexStmt.executeUpdate("CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)") // or indexStmt.executeUpdate("CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)")

Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance

See a full example

JDBC (Groovy)

Import the PGvector class

importcom.pgvector.PGvector

Enable the extension

def setupStmt = conn.createStatement() setupStmt.executeUpdate("CREATE EXTENSION IF NOT EXISTS vector")

Register the types with your connection

PGvector.registerTypes(conn)

Create a table

def createStmt = conn.createStatement() createStmt.executeUpdate("CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))")

Insert a vector

def insertStmt = conn.prepareStatement("INSERT INTO items (embedding) VALUES (?)") insertStmt.setObject(1, newPGvector([1, 1, 1] asfloat[])) insertStmt.executeUpdate()

Get the nearest neighbors

def neighborStmt = conn.prepareStatement("SELECT * FROM items ORDER BY embedding <-> ? LIMIT 5") neighborStmt.setObject(1, newPGvector([1, 1, 1] asfloat[])) def rs = neighborStmt.executeQuery() while (rs.next()){println((PGvector) rs.getObject("embedding")) }

Add an approximate index

def indexStmt = conn.createStatement() indexStmt.executeUpdate("CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)") // or indexStmt.executeUpdate("CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)")

Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance

See a full example

Groovy SQL

Import the PGvector class

importcom.pgvector.PGvector

Enable the extension

sql.execute "CREATE EXTENSION IF NOT EXISTS vector"

Create a table

sql.execute "CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))"

Insert a vector

def params = [newPGvector([1, 1, 1] asfloat[])] sql.executeInsert "INSERT INTO items (embedding) VALUES (?)", params

Get the nearest neighbors

def params = [newPGvector([1, 1, 1] asfloat[])] sql.eachRow("SELECT * FROM items ORDER BY embedding <-> ? LIMIT 5", params){row->println row.embedding }

Add an approximate index

sql.execute "CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)"// or sql.execute "CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)"

Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance

See a full example

JDBC (Scala)

Import the PGvector class

importcom.pgvector.PGvector

Enable the extension

valsetupStmt = conn.createStatement() setupStmt.executeUpdate("CREATE EXTENSION IF NOT EXISTS vector")

Register the types with your connection

PGvector.registerTypes(conn)

Create a table

valcreateStmt= conn.createStatement() createStmt.executeUpdate("CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))")

Insert a vector

valinsertStmt= conn.prepareStatement("INSERT INTO items (embedding) VALUES (?)") insertStmt.setObject(1, newPGvector(Array[Float](1, 1, 1))) insertStmt.executeUpdate()

Get the nearest neighbors

valneighborStmt= conn.prepareStatement("SELECT * FROM items ORDER BY embedding <-> ? LIMIT 5") neighborStmt.setObject(1, newPGvector(Array[Float](1, 1, 1))) valrs= neighborStmt.executeQuery() while (rs.next()){println(rs.getObject("embedding").asInstanceOf[PGvector]) }

Add an approximate index

valindexStmt= conn.createStatement() indexStmt.executeUpdate("CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)") // or indexStmt.executeUpdate("CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)")

Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance

See a full example

Slick

Import the PGvector class

importcom.pgvector.PGvector

Enable the extension

db.run(sqlu"CREATE EXTENSION IF NOT EXISTS vector")

Add a vector column

classItems(tag: Tag) extendsTable[(String)](tag, "items"){defembedding= column[String]("embedding", O.SqlType("vector(3)")) def*= (embedding) }

Insert a vector

valembedding=newPGvector(Array[Float](1, 1, 1)).toString db.run(sqlu"INSERT INTO items (embedding) VALUES ($embedding::vector)")

Get the nearest neighbors

valembedding=newPGvector(Array[Float](1, 1, 1)).toString db.run(sql"SELECT * FROM items ORDER BY embedding <-> $embedding::vector LIMIT 5".as[(String)])

Add an approximate index

db.run(sqlu"CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)") // or db.run(sqlu"CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)")

Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance

See a full example

Reference

Vectors

Create a vector from an array

PGvectorvec = newPGvector(newfloat[]{1, 2, 3});

Or a List<T>

List<Float> list = List.of(Float.valueOf(1), Float.valueOf(2), Float.valueOf(3)); PGvectorvec = newPGvector(list);

Get an array

float[] arr = vec.toArray();

Half Vectors

Create a half vector from an array

PGhalfvecvec = newPGhalfvec(newfloat[]{1, 2, 3});

Or a List<T>

List<Float> list = List.of(Float.valueOf(1), Float.valueOf(2), Float.valueOf(3)); PGhalfvecvec = newPGhalfvec(list);

Get an array

float[] arr = vec.toArray();

Binary Vectors

Create a binary vector from a byte array

PGbitvec = newPGbit(newbyte[]{(byte) 0b00000000, (byte) 0b11111111});

Or a boolean array

PGbitvec = newPGbit(newboolean[]{true, false, true});

Or a string

PGbitvec = newPGbit("101");

Get the length (number of bits)

intlength = vec.length();

Get a byte array

byte[] bytes = vec.toByteArray();

Or a boolean array

boolean[] bits = vec.toArray();

Sparse Vectors

Create a sparse vector from an array

PGsparsevecvec = newPGsparsevec(newfloat[]{1, 0, 2, 0, 3, 0});

Or a map of non-zero elements

Map<Integer, Float> map = newHashMap<Integer, Float>(); map.put(Integer.valueOf(0), Float.valueOf(1)); map.put(Integer.valueOf(2), Float.valueOf(2)); map.put(Integer.valueOf(4), Float.valueOf(3)); PGsparsevecvec = newPGsparsevec(map, 6);

Note: Indices start at 0

Get the number of dimensions

intdim = vec.getDimensions();

Get the indices of non-zero elements

int[] indices = vec.getIndices();

Get the values of non-zero elements

float[] values = vec.getValues();

Get an array

float[] arr = vec.toArray();

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/pgvector/pgvector-java.git cd pgvector-java createdb pgvector_java_test mvn test

To run an example:

cd examples/loading createdb pgvector_example mvn package java -jar target/example-jar-with-dependencies.jar

About

pgvector support for Java, Kotlin, Groovy, and Scala

Resources

License

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •