llama : second attempt to refactor vision API#11292

ngxson · 2025-01-18T19:58:53Z

Supersede #9687

Important

Please do NOT upload gguf produced via this PR on the internet. Other people don't know how to use it and they will complain

Then,

cmake --build build -j --target llama-vision ./build/bin/llama-vision -m ../models/llava-1.5-7b-hf/model.gguf --image ../models/bliss.png # The image showcases a lush green field with a hill in the background. In the foreground, there is a large,# bright, and vibrant green field with a Microsoft Windows XP desktop screen, possibly representing a# screensaver, superimposed onto the scene. The field is expansive and covers most of

Goals of this PR:

Have the first version of public API for llama_vision
Support llava, mobilevlm, minicpm-v 2.6, smolVLM
See how API can adapt to use with encoder-decoder like llama 3.2 vision (so we can add it soon)
Add API to format the chat, equivalent to Processor class on HF library
See how quantizing affect the performance

Things that will be done in follow-up PRs:

Models with encoder-decoder arch like llama 3.2 vision
GPU support
Better image processing function: faster resize function, maybe even abstract out the image transformations and optimize it (example: if we run resize twice, better to detect that and only run it once)
Further clean up the mess in convert-hf-to-gguf python script

ngxson · 2025-01-19T22:54:41Z

Hi @ggerganov @slaren , I would like to ask for an early review from you before proceeding further.

What will be interesting to discuss here is the usage of the new API, as demo in the newly added llama-vision example. The idea is:

Call llama_vision_encode for each image (we don't support batching for now, to simplify the implementation)
Then, get the output embedding ggml_tensor and add it to llama_batch, then llama_decode it.

I'm already be able to make llava and mobilevlm working with llama-vision and convert_hf_to_gguf.py (for minicpm-v, I'm still struggling with it because the conversion is not straight-forward)

Things that are different from the initial discussion in #8010 :

I added a helper function llama_batch_get_one_from_tensor for creating the batch from a tensor, with appropriate n_past (for placing these tokens in the correct place in chat template), and seq_id for future usage in server.
llama_vision_patches actually contains slices of image, not patches, as explained in llava-uhd. The patches are actually produced in clip_image_build_graph by doing a ggml_conv_2d. I think I'll need to rename it to llama_vision_slices, but I actually prefer a more appropriate name like llama_vision_preprocessed_img since we do more than just slicing it (i.e. resize, padding, etc) - feel free to suggest if you have any ideas.

And things that are still messy and will need more works:

Naming, most functions are still prefixed by clip_ and I don't know if I should prefix everything with llama_vision_clip_ or not. Please let me know what's your preference.
Chat template support, we may need to introduce a new API that wraps the llama_chat_apply_template, much like how on transformers, they have Processor class that wraps around Tokenizer
Not sure how this API will be adapted for encoder-decoder arch like llama 3.2 vision. In theory, llama_vision_get_output_tensor should become a no-op, but judging from this implementation, it's still needed. @danbev do you have any ideas?

I would love to hear your opinions about this. Thank you!

ngxson · 2025-01-19T22:59:05Z

src/llama-vision.cpp

+if (ctx.ctx_ggml){
+ggml_free(ctx.ctx_ggml);
+ }
+ ggml_init_params params ={
+/*.mem_size =*/ggml_tensor_overhead(),
+/*.mem_buffer =*/NULL,
+/*.no_alloc =*/true,
+ };
+ ctx.ctx_ggml = ggml_init(params);
+ ctx.output = ggml_dup_tensor(ctx.ctx_ggml, output_node);
+ggml_backend_alloc_ctx_tensors_from_buft(ctx.ctx_ggml, ctx.model->buft);
+ggml_backend_tensor_copy(output_node, ctx.output);


@slaren Not sure if there is a better way, but I'm using a hacky solution here.
Without a dedicated context (and ggml_backend_tensor_copy), the underlay buffer is realloc before the next llama_decode, rendering the data unusable.

If the vision part uses the same scheduler than the llama_context, that's unavoidable. You could pre-allocate the tensor in a different buffer to avoid the copy, but that's an optimization that can be done later.

If we have a separate encoder context for the clip model, the decoder context could reference tensors from it directly. They would be interpreted as inputs for the decoder.

slaren · 2025-01-20T00:55:47Z

llama_vision_patches actually contains slices of image, not patches, as explained in llava-uhd. The patches are actually produced in clip_image_build_graph by doing a ggml_conv_2d. I think I'll need to rename it to llama_vision_slices, but I actually prefer a more appropriate name like llama_vision_preprocessed_img since we do more than just slicing it (i.e. resize, padding, etc) - feel free to suggest if you have any ideas.

I am just wondering, is there any reason to expose the patches/slices to the user at all? Can the user do anything with the patches other than just immediately call llama_vision_encode and throw them away? If not, then maybe that could be hidden entirely from the user and llama_vision_encode could take directly an image.

danbev · 2025-01-20T06:56:04Z

@ngxson I'll take a closer look at this today and specifically how about how this could work with a cross-attention model like Llama 3.2 Vision 👍

One thing that is related to this work is something we discussed about how these models should be provided. I initially though that creating a single .gguf for Llama 3.2 which contained both the vision encoder and the language model would be the way to go, but as can be read in the linked discussion having separate models is probably a better solution. It would be great to get some clarification regarding this and if vision encoders should be separate .gguf models.
I'm looking at updating the conversion for Llama 3.2 and make changes to convert_hf_to_gguf.py to produce 2 models (vision encoder, and language model) instead of one. I'd like to try this out with this latest vision api proposal but I'd prefer to know what the model(s) should look like before proceeding to not waste time.

ngxson · 2025-01-20T09:25:23Z

@slaren In my first proposal, I made llama_vision_encode to directly accept an image. But then I decide to split it into postprocess-encode because:

The most important reason is because user will be able to retrieve the number of tokens that the image occupies (this can varies depends on image size, in case of llava-uhd). This should be done before any decode/encode so that the user can leave the appropriate places for the image after the tokenizing step. This is also similar to Processor class on HF transformers where it returns a preprocessed image and the tokenized prompt with correct number of tokens "placeholder" for image embd.
Second reason is that by making this a dedicated function, it's easier to manage error codes. This is mostly because this function work at pixel level, not tensor level.
And third reason is because this preprocessing is indeed thread-safe, so for example, llama-server can do this step in HTTP thread, much like how llama_tokenize is currently done in HTTP thread.

ngxson · 2025-01-20T10:17:42Z

Btw I have been repeatedly mentioned about Processor, so I think it's better to give an example of how it works: https://gist.github.com/ngxson/ca46c72f0cc7b441c30dd85c2a24ee62

ggerganov

Adding some thoughts that I have so far.

Continuing along the idea for having separate models and contexts for the encoder and the decoder. I think with proper llama_batch abstraction we can have the following API:

// visionpatches0=llama_vision_tokenize(ctx_enc_v, img0); patches1=llama_vision_tokenize(ctx_enc_v, img1); llama_batch_add_image(batch_enc_v, patches0); llama_batch_add_image(batch_enc_v, patches1); llama_encode(ctx_enc_v, batch_enc_v); embd_enc_v=llama_get_embeddings(ctx_enc_v); // audiomel0=llama_audio_tokenize(ctx_enc_a, audio0); mel1=llama_audio_tokenize(ctx_enc_a, audio1); llama_batch_add_audio(batch_enc_a, mel0); llama_batch_add_audio(batch_enc_a, mel1); llama_encode(ctx_enc_a, batch_enc_a); embd_enc_a=llama_get_embeddings(ctx_enc_a); // text + vision + audiotokens0=llama_tokenize(ctx_dec, tokens0); tokens1=llama_tokenize(ctx_dec, tokens1); llama_batch_add_text (batch_dec, tokens0); llama_batch_add_embd_image(batch_dec, embd_enc_v); llama_batch_add_embd_audio(batch_dec, embd_enc_a); llama_batch_add_text (batch_dec, tokens1); llama_decode(ctx_dec, batch_dec);

For cross-attention models such as Llama 3.2 Vision and Whisper, the decoding context ctx_dec could be initialized with a reference to the encoder context:

llama_context_paramscparams_dec; cparams_dec.ctx_cross[0] =ctx_enc_v; cparams_dec.ctx_cross[1] =ctx_enc_a;

Edit: extended the example with audio input as well.

src/llama-arch.h

ggerganov · 2025-01-20T10:09:34Z

src/llama-vision.cpp

+static ggml_cgraph * clip_image_build_graph(clip_context & ctx, int batch_size, clip_image_size & image_size){
+auto & model = *ctx.model;
+auto & hparams = ctx.model->hparams;
+
+constint hidden_size = hparams.hidden_size;
+constint n_head = hparams.n_head;
+constint d_head = hidden_size / n_head;
+constint patch_size = hparams.patch_size;
+constfloat eps = hparams.eps;
+constint num_patches = ((image_size.width / patch_size) * (image_size.height / patch_size));
+constint num_positions = num_patches + (model.class_embedding ? 1 : 0);
+
+LLAMA_LOG_DEBUG("%s: num_patches = %d\n", __func__, num_patches);


The clip graph should be constructed as any other graph in src/llama.cpp, llm_build_context.

I'm not sure how to do this right now, as I can't see how I can re-use existing build_* to make the cgraph of vision models "blend-in" with the rest of llm_build_context
But what I did so far is to make an equivalent called llama_vision_graph_builder. This meant to be a temporary solution, to simplify the migration in the near future.
Could you please have a look on my llama_vision_graph_builder to see how it can be merged into llm_build_context? Thanks!

ggerganov · 2025-01-20T10:10:29Z

src/llama-vision.cpp

+delete p;
+}
+
+int32_tllama_vision_encode(structllama_context * ctx, llama_vision_patches * p){


Don't think we need separate function - we should be able to reuse llama_encode.

Hmm I don't think we can do this right now, as it requires llama_batch to also accept image tokens.
Do you think it's ok to keep llama_vision_encode(llama_img_tokens &) and refactor llama_batch later on?

ggerganov · 2025-01-20T10:11:53Z

src/llama-vision.cpp

+if (ctx.ctx_ggml){
+ggml_free(ctx.ctx_ggml);
+ }
+ ggml_init_params params ={
+/*.mem_size =*/ggml_tensor_overhead(),
+/*.mem_buffer =*/NULL,
+/*.no_alloc =*/true,
+ };
+ ctx.ctx_ggml = ggml_init(params);
+ ctx.output = ggml_dup_tensor(ctx.ctx_ggml, output_node);
+ggml_backend_alloc_ctx_tensors_from_buft(ctx.ctx_ggml, ctx.model->buft);
+ggml_backend_tensor_copy(output_node, ctx.output);


If we have a separate encoder context for the clip model, the decoder context could reference tensors from it directly. They would be interpreted as inputs for the decoder.

ggerganov · 2025-01-20T10:29:08Z

src/llama-vision.cpp

+structllama_vision_patches * llama_vision_patches_init(
+structllama_context * ctx,
+ llama_vision_bitmap * bmp){
+ clip_context & vctx = ctx->vctx;
+if (vctx.model->hparams.arch == VISION_ARCH_MINICPMV){
+returnnewllama_vision_patches(clip_image_preprocess_minicpmv(vctx, *bmp));
+ }
+returnnewllama_vision_patches(clip_image_preprocess(vctx, *bmp));
+}


I agree that the analogy of "tokenization" in the context of vision models is the conversion of "images -> patches". So the patches could be considered as "image tokens" and it seems reasonable to have a separate function to create patches, since this would have to be performed on the CPU.
I am just wondering, is there any reason to expose the patches/slices to the user at all? Can the user do anything with the patches other than just immediately call llama_vision_encode and throw them away? If not, then maybe that could be hidden entirely from the user and llama_vision_encode could take directly an image.
Even though the user cannot explicitly operate with the patches, it seems to make sense to expose this in order to be able to multi-thread the pre-processing step.
Note that we should also consider the case of Whisper in the context of this abstraction. The whisper model takes raw input audio in PCM format, which is first pre-processed into a mel spectrogram. This pre-processing step, similar to the image pre-processing for CLIP and the text tokenization in text models, is performed on the CPU and can be multi-threaded. Of course, any of the three types of pre-processings could be implemented on the GPU with enough effort, but the important aspect is that this pre-processing can be done in parallel for different inputs and once computed, can be reused with different contexts.
In all cases, the pre-processed input is passed to the transformer graph and the first step is always to convert this input in embeddings. For text, this conversion is trivial - ggml_get_rows(w, tokens). For Whisper, this processes involves a couple of convolutions of the mel spectrogram:
https://github.com/ggerganov/whisper.cpp/blob/99b011a9f5e63f71201bfa583250506453a7b995/src/whisper.cpp#L1904-L1918
For CLIP, this appears to be again a convolution operator applied to the pre-processed input (the image patches) in order to obtain the initial embeddings:
https://github.com/ngxson/llama.cpp/blob/4a7ab89d7593ccb89f80e6e118875ee0b3ede3c7/src/llama-vision.cpp#L581-L616
All these conversions of the pre-processed input (tokens, mel, patches) into the initial embeddings should be implemented in a single place: build_inp_embd().

I agree that the analogy of "tokenization" in the context of vision models is the conversion of "images -> patches". So the patches could be considered as "image tokens" and it seems reasonable to have a separate function to create patches
Make sense then. I realized that I was always associate the notion of "token" with "text", but a quick google search tells that: "In LLMs, a token is a basic unit of input or output [...]"
In that sense, I would propose calling it llama_vision_img_tokens (though, it can be a bit confused because user may expect it a std::vector due to the plural "tokens")
// Structure represents the basic input unit of vision model// This can be a processed image or slices of images under the hoodstructllama_vision_img_tokens; // User must reserve N number of tokens in tokenized text prompt for each imageint32_tllama_vision_get_n_tokens(const llama_vision_img_tokens * img_tokens);

src/llama-vision.h

danbev · 2025-01-22T16:22:51Z

@ngxson Sorry about the delay. I've been able to "force" support for mllama using the latest vision api, that is get an example working. I'm now going to iterate on this and try to figure out how cross attention will work. Just wanted to let you know that some progress is being made.

There is an issue I'm having with the vocab size which I'm not exactly sure how to handle. If anyone has some thoughts around this please let me know.

ngxson · 2025-01-22T21:58:42Z

@danbev No worries, I was busy with minicpm-v too. It's still not working now (inference works, but just missing the llava-uhd preprocessor). Will have a look on your implementation of mllama very soon.

ngxson · 2025-01-22T22:36:11Z

So, minicpm-v template is more complicated because it contains bot the image and all the slices. Here is what it looks like in minicpmv-cli:

<image> (if no slice, we only have one image) </image><slice><image> (first slice) </image><image> (second slice) </image> .... (n-th slice) </slice>

To get rid of this complication, my idea is to have the embeddings of these tokens (<image>, </image>, <slice> and </slice>) appended into the output tensor returned fromllama_vision_encode.

This will make this formatting transparent to the text tokenizer, but will require embeddings of these tokens to be stored as one-hot vectors in the vision model (of course we can use ggml_get_rows to get them, but will be quite messy)

ngxson · 2025-01-23T11:23:25Z

Ok so I managed to get minicpm-v kinda work out of the box with the API (no changes to user-space code is required).

Upon giving it win XP wallpaper bliss, it says: I see a serene landscape featuring a vast expanse of green grass under a clear blue sky

It currently operates with a resized version of the image (like llava), so the performance will be bad for bigger images (with more details). I'll get llava-uhd to work, which breaks the image into slices and thus allow the LLM to "see" the image at different zoom level, thus preserving details.

ngxson · 2025-03-01T15:44:46Z

Try another image / format / resolution. I'd recommend you pinpoint problem on your side first, to prevent spamming this thread with too much data.

And again, nothing is guaranteed to work. This is a WIP.

(I hide your comments because they take too much space, it's hard for me to follow)

AIWintermuteAI · 2025-03-01T16:19:36Z

Sure, no worries! I'll use collapsible text next time I need to post large logs, thanks for the reminder.
It's 300x241 pixels image I found when searching for bliss wallpaper on Google. Perhaps you can share your testing sample here?
And again, no worries, I totally understand this is WIP - my comments are just for feedback to you and other people who might be testing this, it's not a nudge :)

I'll try testing with some more images and I guess see what can be done about ValueError: Can not map tensor 'model.text_model.embed_tokens.weight' on the latest commit here. Looks like that .weight part normally is removed, but for some reason it is not.

AIWintermuteAI · 2025-03-01T16:26:23Z

Update:
Need to include the instructions and special tokens into the prompt, e.g.

./build/bin/llama-vision --image bliss.png -m ../SmolVLM-500M-Instruct/SmolVLM-500M-Instruct-F16.gguf -p "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<img_placement>\nwhat do you see?<|im_end|>\n<|im_start|>assistant\n"

Then everything works!

eval text batch (14 tokens) eval image batch (64 embeddings) eval text batch (26 tokens) prompt processed, 90 tokens The sky is a brilliant blue, dotted with fluffy white clouds that look like cotton candy. The sun is shining, casting a warm glow across the landscape. To the left, there's a small hill, covered in green grass and dotted with wildflowers. The hill is dotted with trees, and the leaves are a rich, dark

ngxson · 2025-03-01T22:15:43Z

OK so Phi-4-multimodal-instruct is a bit more messy.

Traditional vision model are simple: just 2 separated transformers, one for vision encoder and one for language decoder. However, on Phi-4 embedding data from vision/audio encoder must also be processed using a dedicated LoRA adapter applied on top of the language decoder

Very technical details

Normal vision models:

flowchart TD image --> vision_transformer vision_transformer[[vision_transformer]] --> embd_input text_input --> embd_input embd_input --> text_transformer[[text_transformer]] text_transformer --> text_output

Phi-4 multimodal:

flowchart TD image --> vision_transformer[[vision_transformer]] vision_transformer --> embd_input audio --> audio_transformer[[audio_transformer]] audio_transformer --> embd_input text_input --> embd_input embd_input --> text_transformer subgraph text_transformer vision_LoRA[[vision_LoRA]] audio_LoRA[[audio_LoRA]] base_model[[base_model]] end text_transformer --> text_output

Diagram from the paper:

For now, I've been able to convert only the text/language part. Turns out, it just a simple Phi-4-mini-instruct under the hood, so nothing interesting.

This is also mentioned in the paper:

~~Will see if it's easy to re-implement that LoRA + projectors. Otherwise, we will need to delay Phi-4-multimodal for later.~~

Update: the LoRA part is very complicated to implement right now, so it will be for a dedicated research/PR in the future.

revert Phi-4-mm since we cannot support LoRA for now, too complicated

This reverts commit c4e9231.

This reverts commit 21aa2f5.

lucasjinreal · 2025-03-09T08:11:44Z

Hello, does qwen2.5 vl conversion script from raw safetensors into GGUF supported now? Also curious what's the standared way to support a new model in convert_hf_to_gguf.py, it's looks like a little bit tricky needs handle very specific tensor names in various model arch

matheusfrancisco · 2025-04-02T09:51:39Z

are here any intention to support unsloth/Llama-3.2-11B-Vision-Instruct?,
thanks for this pull request.

ngxson added 2 commits January 18, 2025 12:19

wip
2a458d1

llama : second attempt to refactor vision API
0a81051

github-actionsbot added the examples label Jan 18, 2025

add back convert hf to gguf
6cabdda

github-actionsbot added python python script changes server labels Jan 18, 2025

ngxson added 2 commits January 19, 2025 16:29

add mobilevlm
d0068ef

wip minicpmv
4a7ab89

ngxson commented Jan 19, 2025
View reviewed changes

ggerganov mentioned this pull request Jan 20, 2025
support Minicpm-omni in image understanding #11289
Merged

ggerganov reviewed Jan 20, 2025
View reviewed changes

danbev reviewed Jan 20, 2025
View reviewed changes

src/llama-vision.h Outdated Show resolvedHide resolved

ngxson added 5 commits January 21, 2025 10:51

change gguf KV from clip to vit
431bb08

reuse LLM_ARCH and LLM_TENSOR
bd0714b

rename everywhere
ad38e87

Merge branch 'master' into xsn/vision_2
32daa38

temporary refactor llama_vision_graph_builder
9716c7b

ngxson added 2 commits January 22, 2025 22:26

wip minicpmv
ba489b4

minicpmv works but missing uhd slices
c0d93dd

minicpm working without uhd
8586d23

correct positions for siglip
25a97ce

This comment was marked as resolved.
Sign in to view

ngxson added 4 commits March 1, 2025 22:42

Merge branch 'master' into xsn/vision_2
0ec6bce

clarify
7863232

fix smolVLM conversion
c4e9231

phi-4-mm TEXT-ONLY for now
21aa2f5

ngxson added 3 commits March 2, 2025 10:29

Revert "fix smolVLM conversion"
0ead9c4
This reverts commit c4e9231.

a bit cleaner for llava conversion
45bc188

Revert "phi-4-mm TEXT-ONLY for now"
5283a15
This reverts commit 21aa2f5.

This was referenced Mar 5, 2025
clip.cpp / gguf-py: Support for Qwen2.5 VL - WIP / REVIEW NEEDED (#11483) #12119
Closed
an Error occured during Quantization Independent-AI-Labs/local-super-agents#3
Open

ngxson mentioned this pull request Mar 9, 2025
(research) experiment with phi-4-multimodal vision support #12274
Draft

AIWintermuteAI mentioned this pull request Mar 10, 2025
server: Bring back multimodal support #8010
Closed
18 tasks

This was referenced Mar 10, 2025
clip : bring back GPU support #12322
Merged
clip : Experimental support for Gemma 3 vision #12344
Merged

ngxson added 3 commits March 16, 2025 12:11

Merge branch 'master' into xsn/vision_2
424807e

fix merge problem
cee80d4

fix merge (2)
cdff8c5

shubham0204 mentioned this pull request Mar 22, 2025
[Feature] VLM inference support shubham0204/SmolChat-Android#59
Open

aropb mentioned this pull request Mar 24, 2025
[BUG]: Error loading the LLava model SciSharp/LLamaSharp#1136
Closed

Tianyue-Zhao mentioned this pull request Mar 31, 2025
WIP: Add support for CogAgent #12679
Draft

ngxson mentioned this pull request Apr 9, 2025
llava : introduce libmtmd #12849
Merged

ngxson mentioned this pull request Apr 19, 2025
convert : experimental support for --mmproj flag #13023
Merged

giladgd mentioned this pull request May 9, 2025
feat: pass an image as part of the evaluation withcatai/node-llama-cpp#88
Open

rick-github mentioned this pull request Sep 4, 2025
SmolVLM ollama/ollama#9559
Open

llama : second attempt to refactor vision API#11292

Are you sure you want to change the base?

llama : second attempt to refactor vision API #11292

Uh oh!

Conversation

ngxson commented Jan 18, 2025• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Please do NOT upload gguf produced via this PR on the internet. Other people don't know how to use it and they will complain

Uh oh!

ngxson commented Jan 19, 2025

Uh oh!

ngxsonJan 19, 2025

Choose a reason for hiding this comment

Uh oh!

slarenJan 20, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganovJan 20, 2025

Choose a reason for hiding this comment

Uh oh!

slaren commented Jan 20, 2025

Uh oh!

danbev commented Jan 20, 2025

Uh oh!

ngxson commented Jan 20, 2025• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Jan 20, 2025• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ggerganovJan 20, 2025

Choose a reason for hiding this comment

Uh oh!

ngxsonJan 22, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganovJan 20, 2025

Choose a reason for hiding this comment

Uh oh!

ngxsonJan 21, 2025• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggerganovJan 20, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganovJan 20, 2025• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxsonJan 20, 2025• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

danbev commented Jan 22, 2025

Uh oh!

ngxson commented Jan 22, 2025• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Jan 22, 2025

Uh oh!

ngxson commented Jan 23, 2025

Uh oh!

This comment was marked as resolved.

ngxson commented Mar 1, 2025• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AIWintermuteAI commented Mar 1, 2025

Uh oh!

AIWintermuteAI commented Mar 1, 2025• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Mar 1, 2025• edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ngxson commented Jan 18, 2025•
edited
Loading

ngxson commented Jan 20, 2025•
edited
Loading

ngxson commented Jan 20, 2025•
edited
Loading

ggerganov left a comment •
edited
Loading

ngxsonJan 21, 2025•
edited
Loading

ggerganovJan 20, 2025•
edited
Loading

ngxsonJan 20, 2025•
edited
Loading

ngxson commented Jan 22, 2025•
edited
Loading

ngxson commented Mar 1, 2025•
edited
Loading

AIWintermuteAI commented Mar 1, 2025•
edited
Loading

ngxson commented Mar 1, 2025•
edited
Loading

matheusfrancisco commented Apr 2, 2025•
edited
Loading