Skip to content

The main component of AI DIAL, which provides unified API to different chat completion and embedding models, assistants, and applications

License

Notifications You must be signed in to change notification settings

epam/ai-dial-core

DIAL Core

About DIALX

Discord

Overview

Note

HTTP Proxy provides unified API to different chat completion and embedding models and applications. Written in Java 21 and built on top of Eclipse Vert.x.

ai-dial-core

Read more about the DIAL Core


Build 🏗

DIAL Core has a dependency on GitHub packages of JClouds. Github doesn't provide anonymous access to packages.

That requires to pass credentials GitHub for access to published JClouds packages. See the code snippet below:

repositories{maven{url = uri("https://maven.pkg.github.com/epam/jclouds") credentials{username = project.findProperty("gpr.user") ?:System.getenv("GPR_USERNAME") password = project.findProperty("gpr.key") ?:System.getenv("GPR_PASSWORD") } } mavenCentral() }

Important

You should set env variables GPR_USERNAME and GPR_PASSWORD to valid values, where GPR_USERNAME - GitHub username and GPR_PASSWORD - GitHub personal access token.

Important

The access token requires the permission read:packages.

See more details here to generate personal access token in GitHub.

Build the project with Gradle and Java 21:

./gradlew build 

Run ▶️

Run the project with Gradle:

./gradlew :server:run 

Or run com.epam.aidial.core.server.AiDial class from your favorite IDE.


Helm Deployment

You have the option to deploy the DIAL Core on the Kubernetes cluster by utilizing an umbrelladial Helm chart, which also deploys other DIAL components. Alternatively, you can use dial-core Helm chart to deploy just Core.

Note

Refer to Examples for guidelines.

In any case, in your Helm values file, it is necessary to provide application's configurations described in the Configuration section.


Configuration ⚙️

Static settings

Note

Static settings are used on startup and cannot be changed while application is running. Refer to example to view the example configuration file.

Priority order:

  1. Environment variables with extra "aidial." prefix. E.g. "aidial.server.port", "aidial.config.files".
  2. File specified in "AIDIAL_SETTINGS" environment variable.
  3. Default resource file: src/main/resources/aidial.settings.json.
SettingDefaultRequiredDescription
config.filesaidial.config.jsonNoList of paths to dynamic settings. Refer to example of the file with dynamic settings.
config.reload60000NoConfig reload interval in milliseconds.
vertx.*-NoVertx settings. Refer to vertx.io to learn more.
server.*-NoVertx HTTP server settings for incoming requests. Refer to HTTP server options to learn more.
client.*-NoVertx HTTP client settings for outbound requests. Refer to HTTP client options to learn more.
webSocketClient.*-NoVertx web socket client settings for outbound requests. Refer to WebSocket client options to learn more.
invitations.ttlInSeconds259200NoInvitation time to live in seconds.
perRequestApiKey.ttl1800NoThe TTL in seconds of per request API key
asyncTaskExecutor.useVirtualThreadstrueNoThe flag determines if virtual threads are used to run blocking tasks or platform threads.
config.jsonMergeStrategy.overwriteArraysfalseNoSpecifies a merging strategy for JSON arrays. If it's set to true, arrays will be overwritten. Otherwise, they will be concatenated.
Identity Providers Configurations
SettingDefaultRequiredDescription
config.jsonMergeStrategy.overwriteArraysfalseNoSpecifies a merging strategy for JSON arrays. If it's set to true, arrays will be overwritten. Otherwise, they will be concatenated.
identityProviders-YesMap of identity providers. Note: At least one identity provider must be provided. Refer to examples to view available providers. Refer to IDP Configuration to view guidelines for configuring supported providers.
identityProviders-YesMap of identity providers. Note: At least one identity provider must be provided. Refer to examples to view available providers. Refer to IDP Configuration to view guidelines for configuring supported providers.
identityProviders.*.jwksUrl-OptionalUrl to jwks provider. Required if disabledVerifyJwt is set to false. Note: Either jwksUrl or userInfoEndpoint must be provided.
identityProviders.*.userInfoEndpoint-OptionalUrl to user info endpoint. Note: Either jwksUrl or userInfoEndpoint must be provided or disableJwtVerification is unset. Refer to Google example.
identityProviders.*.rolePath-YesPath(s) to the claim user roles in JWT token or user info response, e.g. resource_access.chatbot-ui.roles or just roles. Can be single String or Array of Strings. Refer to IDP Configuration to view guidelines for configuring supported providers.
identityProviders.*.projectPath-NoPath(s) to the claim in JWT token or user info response, e.g. azp, aud or some.path.client from which project name can be taken. Can be single String. Refer to IDP Configuration to view guidelines for configuring supported providers.
identityProviders.*.rolesDelimiter-NoDelimiter to split roles into array in case when list of roles presented as single String. e.g. "rolesDelimiter": " "
identityProviders.*.loggingKey-NoUser information to search in claims of JWT token. email or sub should be sufficient in most cases. Note: email might be unavailable for some IDPs. Please check your IDP documentation in this case.
identityProviders.*.loggingSalt-NoSalt to hash user information for logging.
identityProviders.*.positiveCacheExpirationMs600000NoHow long to retain JWKS response in the cache in case of successfull response.
identityProviders.*.negativeCacheExpirationMs10000NoHow long to retain JWKS response in the cache in case of failed response.
identityProviders.*.issuerPattern-NoRegexp to match the claim "iss" to identity provider.
identityProviders.*.disableJwtVerificationfalseNoThe flag disables JWT verification. Note. userInfoEndpoint must be unset if the flag is set to true.
identityProviders.*.audience-NoIf the setting is set it will be validated against the claim aud in JWT
identityProviders.*.userDisplayName-NoPath to the claim in JWT token or user info response where user display name can be taken.
Toolsets Security Configurations
SettingDefaultRequiredDescription
toolsets.security.authorizationServers-NoPath(s) to the authorization server URLs trusted to issue access tokens for MCP clients.
toolsets.security.resourceSchemahttpsNoSchema of the resource server. This URL schema is used to construct the resource identifier for token validation, as defined in RFC 9728. If not specified, the default value will be applied.
toolsets.security.resourceHost-NoThe public, fully-qualified hostname of this resource server (e.g., api.example.com). This is used to construct the resource identifier for token validation per RFC 9728. If not set, the host is derived from the incoming request.
toolsets.security.scopesSupported-NoList of scope values, as defined in OAuth 2.0 [RFC6749], that are used in authorization requests to request access to this protected resource.
toolsets.security.kms.providerunencryptedNoSpecifies KMS provider. Supported providers: aws, azure, gcp, unencrypted
toolsets.security.kms.keyId-NoIdentifies the KMS key to use in the encryption operation.
toolsets.security.kms.region-NoGeo region where the KMS is located. Required if provider is set to aws.
toolsets.security.kms.encryptionAlgorithm-NoEncryption algorithm. Required if provider is set to azure. Note Refer to aws, azure to get the list of supported algorithms for azure. Default value for aws is SYMMETRIC_DEFAULT
toolsets.security.kms.cache.enabledtrueNoThe flag determines if CEK cache is enabled.
toolsets.security.kms.cache.maxSize10000NoMaximum number of cached CEK.
toolsets.security.kms.cache.expiration600000NoExpiration in milliseconds for cached CEK.
toolsets.security.encryption.algorithmAESNoThe encryption algorithm to use for content encryption operations. Commonly "AES", but may be changed to support other algorithms supported by the JCE provider.
toolsets.security.encryption.keySize256NoKey size in bits for the encryption algorithm. For AES, valid values are 128, 192, or 256, depending on the algorithm and provider policy.
toolsets.security.encryption.cipherTransformationAES/GCM/NoPaddingNoThe cipher transformation specifying the algorithm, mode, and padding (e.g., "AES/GCM/NoPadding"). Must be compatible with the selected algorithm.
toolsets.security.encryption.ivLengthBytes12NoLength of the initialization vector (IV) in bytes. For AES-GCM, 12 bytes (96 bits) is recommended by NIST.
toolsets.security.encryption.gcmTagLengthBits128NoLength of the authentication tag in bits when using GCM mode. NIST recommends 128 bits for maximum integrity protection.
Storage Configurations
SettingDefaultRequiredDescription
storage.providerfilesystemYesSpecifies blob storage provider. Supported providers: s3, aws-s3, azureblob, google-cloud-storage, filesystem. See examples in the sections below.
storage.endpoint-OptionalSpecifies endpoint url for s3 compatible storages. Note: The setting might be required. That depends on a concrete provider.
storage.identity-OptionalBlob storage access key. Can be optional for filesystem, aws-s3, google-cloud-storage providers. Refer to sections in this document dedicated to specific storage providers.
storage.credential-OptionalBlob storage secret key. Can be optional for filesystem, aws-s3, google-cloud-storage providers.
storage.bucket-NoBlob storage bucket.
storage.overrides.*-NoKey-value pairs to override storage settings. * might be any specific blob storage setting to be overridden. Refer to examples in the sections below.
storage.createBucketfalseNoIndicates whether bucket should be created on start-up.
storage.prefix-NoBase prefix for all stored resources. The purpose to use the same bucket for different environments, e.g. dev, prod, pre-prod. Must not contain path separators or any invalid chars.
storage.maxUploadedFileSize536870912NoMaximum size in bytes of uploaded file. If a size of uploaded file exceeds the limit the server returns HTTP code 413
Encryption Configurations
SettingDefaultRequiredDescription
encryption.secret-NoSecret is used for AES encryption of a prefix to the bucket blob storage. The value should be random generated string.
encryption.key-NoKey is used for AES encryption of a prefix to the bucket blob storage. The value should be random generated string.
Resources Configurations
SettingDefaultRequiredDescription
resources.maxSize67108864NoMax allowed size in bytes for a resource.
resources.maxSizeToCache1048576NoMax size in bytes for a resource to cache in Redis.
resources.syncPeriod60000NoPeriod in milliseconds, how frequently check for resources to sync.
resources.syncDelay120000NoDelay in milliseconds for a resource to be written back in object storage after last modification.
resources.syncBatch4096NoHow many resources to sync in one go.
resources.cacheExpiration300000NoExpiration in milliseconds for synced resources in Redis.
resources.compressionMinSize256NoCompress a resource with gzip if its size in bytes more or equal to this value.
Redis Configurations
SettingDefaultRequiredDescription
redis.singleServerConfig.address-YesRedis single server addresses, e.g. "redis://host:port". Either singleServerConfig or clusterServersConfig must be provided.
redis.clusterServersConfig.nodeAddresses-YesJson array with Redis cluster server addresses, e.g. ["redis://host1:port1","redis://host2:port2"]. Either singleServerConfig or clusterServersConfig must be provided.
redis.provider.*-NoProvider specific settings
redis.provider.name-YesProvider name. The valid values are aws-elasti-cache(see instructions), gcp-memory-store(see instructions), azure-redis-cache(see instructions.
redis.provider.userId-YesIAM-enabled user ID. Note. It's applied to aws-elasti-cache
redis.provider.accountName-YesThe resource name of the service account for which the credentials are requested, in the following format: projects/-/serviceAccounts/{ACCOUNT_EMAIL_OR_UNIQUEID}. The - wildcard character is required; replacing it with a project ID is invalid. Note. It's applied to gcp-memory-store
redis.provider.region-YesGeo region where the cache is located. Note. It's applied to aws-elasti-cache
redis.provider.clusterName-YesRedis cluster name. Note. It's applied to aws-elasti-cache
redis.provider.serverless-YesThe flag indicates if the cache is serverless. Note. It's applied to aws-elasti-cache
Access Configurations
SettingDefaultRequiredDescription
access.admin.rules-NoMatches claims from identity providers with the rules to figure out whether a user is allowed to perform admin actions (READ and WRITE access to any resource, approving publication requests from DIAL users.
Configuration example for DIAL Core:
"access":{"admin":{"rules": [{"function": "EQUAL","source": "roles","targets": ["admin"]}]}}
Where,
function - a matching function one of TRUE (any user is admin), FALSE (noone is admin), EQUAL, CONTAIN, REGEX
source - the path to the claim in the JWT token payload that should be evaluated against the targets.
targets - is an array of values that the system checks for in the source claim.
access.createCodeAppRoles-NoThe list of user roles to be allowed to create custom code applications or run code interpreter. Note. Calls by per request key are permitted even if the originator doesn't have permissions.
Applications Configurations
SettingDefaultRequiredDescription
applications.includeCustomAppsfalseNoThe flag indicates whether applications should be included into openai listing (required for Code Apps, Custom Apps, Quick Apps, etc)
applications.controllerEndpoint-NoThe endpoint to Application Controller Web Service that manages deployments for applications with functions
applications.controllerTimeout240000NoThe timeout of operations to Application Controller Web Service
Code Interpreter Configurations
SettingDefaultRequiredDescription
codeInterpreter.sessionImage-NoThe code interpreter session image to use
codeInterpreter.sessionProxyUrl-NoThe code interpreter will be deployed as a pod instead of knative deployment and all requests will be proxied through nginx proxy
codeInterpreter.sessionTtl600000NoThe session time to leave after the last API call
codeInterpreter.checkPeriod10000NoThe interval at which to check active sessions for expiration
codeInterpreter.checkSize256NoThe maximum number of active sessions to check in single check

Storage requirements

DIAL Core stores user data in the following storages:

  • Blob Storage keeps permanent data.
  • Redis keeps volatile in-memory data for fast access.

Note

Refer to Storage Requirements to learn more.

Dynamic settings

Note

Dynamic settings are stored in JSON files, specified via "config.files" static setting, and reloaded at interval, specified via "config.reload" static setting. Refer to example.

Dynamic settings can include the following parameters:

ParameterDescription
routesA list of registered routes in DIAL Core. Refer to Routes to see dynamic settings.
interceptorsA list of deployed DIAL Interceptors and their parameters. Refer to Interceptors to see dynamic settings.
globalInterceptorsA list of interceptors to be executed for any deployment on chat completion request. Refer to Interceptors to learn more.
applicationsA list of deployed applications and their parameters. Refer to Applications to see dynamic settings.
modelsA list of deployed models and their parameters. Refer to Models to see dynamic settings.
toolsetsA list of available toolsets and their parameters. Refer to Toolsets to see dynamic settings.
rolesAPI key or JWT roles and their parameters. Refer to Roles to see dynamic settings.
keysAPI keys and their parameters. Refer to API Keys to see dynamic settings.
retriableErrorCodesList of Retriable Error Codes for handling outages at LLM Providers. This list extends the existing error codes (429, 502, 503, 504) but doesn't override them.
applicationTypeSchemasMap of application schemas where key - schema ID, value - schema itself in JSON format. All schemas must be conformed to the root schema https://dial.epam.com/application_type_schemas/schema#. See link

License

Copyright (C) 2024 EPAM Systems

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

The main component of AI DIAL, which provides unified API to different chat completion and embedding models, assistants, and applications

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors 29

Languages