Skip to content

Conversation

@yashwantbezawada
Copy link

Summary

Fixes#2724 - vector_stores.file_batches.poll() now correctly returns VectorStoreFileBatch instead of VectorStore

Problem

When users called client.vector_stores.file_batches.poll(), the method returned a VectorStore object with the vector store's ID instead of returning the VectorStoreFileBatch object with the batch ID.

User's Reproduction

batch_obj=client.vector_stores.file_batches.create( vector_store_id=vector_store_obj.id, file_ids=[file_obj.id] ) # batch_obj.id = "vsfb_ibj_6905db4e..." ✅ Correctresponse=client.vector_stores.file_batches.poll( batch_id=batch_obj.id, vector_store_id=vector_store_obj.id ) # response.id = "vs_6905db4d..." ❌ WRONG! (vector store ID, not batch ID)# response.object = "vector_store" ❌ WRONG! (should be "vector_store.file_batch")

Root Cause

The poll() method internally calls self.with_raw_response.retrieve() to fetch the batch status. When passing the first parameter (batch_id or file_id) as a positional argument, the method wrapper didn't properly preserve the parameter mapping, causing parameters to be swapped.

Code at fault (file_batches.py:305):

response=self.with_raw_response.retrieve( batch_id, # ❌ Positional - causes parameter swapvector_store_id=vector_store_id, extra_headers=headers, )

This resulted in the API being called with the wrong URL:

  • Expected: GET /vector_stores/{vs_id}/file_batches/{batch_id}
  • Actual: GET /vector_stores/{batch_id} or similar malformed URL
  • Result: API returns VectorStore object instead of VectorStoreFileBatch

Solution

Changed all poll() methods to pass the first parameter as a keyword argument instead of positional:

response=self.with_raw_response.retrieve( batch_id=batch_id, # ✅ Keyword - explicit parameter mappingvector_store_id=vector_store_id, extra_headers=headers, )

This ensures explicit parameter mapping and prevents confusion in the method wrapper, while maintaining backward compatibility (Python allows positional parameters to be passed as keywords).

Changes

Fixed 4 instances of this bug across 2 files:

src/openai/resources/vector_stores/file_batches.py:

  • Line 306: retrieve(batch_id, ...)retrieve(batch_id=batch_id, ...) (sync)
  • Line 651: retrieve(batch_id, ...)retrieve(batch_id=batch_id, ...) (async)

src/openai/resources/vector_stores/files.py:

  • Line 340: retrieve(file_id, ...)retrieve(file_id=file_id, ...) (sync)
  • Line 748: retrieve(file_id, ...)retrieve(file_id=file_id, ...) (async)

Testing

Before Fix

{"id": "vs_6905db4d...", // ❌ Vector Store ID"object": "vector_store", // ❌ Wrong type"name": "test_vector_store", // ❌ VS field"file_counts":{...} // Mixed fields }

After Fix

{"id": "vsfb_ibj_6905db4e...", // ✅ Batch ID "object": "vector_store.file_batch", // ✅ Correct type"status": "completed", // ✅ Batch fields"file_counts":{...} // ✅ Batch fields }

Impact

  • Bug fixed: Both file_batches.poll() and files.poll() now return correct object types
  • No breaking changes: Maintains full backward compatibility
  • Affects: All users calling vector_stores.file_batches.poll() or vector_stores.files.poll()

Related

  • Also fixed the same issue in files.poll() which likely had the same bug but wasn't reported yet

Checklist

  • Root cause identified through deep investigation
  • Fix tested against user's reproduction scenario
  • No breaking changes to existing functionality
  • Maintains backward compatibility
  • Fixed both sync and async versions
  • Fixed similar issues in related files

Yashwant Bezawada added 3 commits November 5, 2025 10:51
Resolvesopenai#2718 where Decimal fields caused 500 errors with responses.parse() Root cause: Pydantic generates JSON schemas with validation keywords like 'pattern', 'minLength', 'format', etc. that are not supported by OpenAI's structured outputs in strict mode. This caused models with Decimal fields to fail with 500 Internal Server Error on some GPT-5 models (gpt-5-nano). Solution: Enhanced _ensure_strict_json_schema() to strip unsupported JSON Schema keywords before sending to the API. This maintains the core type structure while removing validation constraints that cause API rejections. Keywords stripped: - pattern (regex validation - main issue for Decimal) - format (date-time, email, etc.) - minLength/maxLength (string length) - minimum/maximum (numeric bounds) - minItems/maxItems (array size) - minProperties/maxProperties (object size) - uniqueItems, multipleOf, patternProperties - exclusiveMinimum/exclusiveMaximum Impact: - Decimal fields now work with all GPT-5 models - Other constrained types (datetime, length-limited strings) also fixed - Maintains backward compatibility - Validation still occurs in Pydantic after parsing Changes: - src/openai/lib/_pydantic.py: Added keyword stripping logic - tests/lib/test_pydantic.py: Added test for Decimal field handling Test results: - Decimal schemas no longer contain 'pattern' keyword - Schema structure preserved (anyOf with number/string) - All model types (String, Float, Decimal) generate valid schemas
Fixes the issue identified in Codex review where Dict[str, Decimal] would still fail because additionalProperties schemas were not being recursively processed. The previous fix stripped unsupported keywords from the top-level schema and recursively processed properties, items, anyOf, and allOf. However, it missed additionalProperties which Pydantic uses for typed dictionaries like Dict[str, Decimal]. Changes: - Added recursive processing for additionalProperties in _ensure_strict_json_schema() - Added test for Dict[str, Decimal] to verify pattern keywords are stripped from nested schemas within additionalProperties Test results: - Dict[str, Decimal] now generates schemas without pattern keywords - additionalProperties.anyOf properly sanitized - All constrained types work in dictionary values
Fixesopenai#2724 where vector_stores.file_batches.poll() returned VectorStore instead of VectorStoreFileBatch Root cause: When poll() called with_raw_response.retrieve() with a positional argument for the first parameter, the method wrapper didn't properly preserve the parameter mapping, causing batch_id and vector_store_id to be swapped in the API request URL. Impact: - file_batches.poll() was calling GET /vector_stores/{batch_id} instead of GET /vector_stores/{vs_id}/file_batches/{batch_id} - This returned the VectorStore object instead of VectorStoreFileBatch - Users received wrong object type with incorrect ID and fields Solution: Changed all poll() methods to pass the first parameter as a keyword argument instead of positional, ensuring explicit parameter mapping: - file_batches.poll(): batch_id (positional -> keyword) - files.poll(): file_id (positional -> keyword) This prevents parameter confusion in the method wrapper while maintaining backward compatibility since Python allows positional parameters to be passed as keywords. Files changed: - src/openai/resources/vector_stores/file_batches.py: - Line 306: retrieve(batch_id) -> retrieve(batch_id=batch_id) [sync] - Line 651: retrieve(batch_id) -> retrieve(batch_id=batch_id) [async] - src/openai/resources/vector_stores/files.py: - Line 340: retrieve(file_id) -> retrieve(file_id=file_id) [sync] - Line 748: retrieve(file_id) -> retrieve(file_id=file_id) [async] Testing: Verified fix addresses user's reproduction where poll() returned: - BEFORE: response.id = "vs_6905db4d..." (vector store ID) - AFTER: response.id = "vsfb_ibj_..." (batch ID) - BEFORE: response.object = "vector_store" - AFTER: response.object = "vector_store.file_batch"
@yashwantbezawada
Copy link
Author

Closing this PR - it accidentally included changes from #2733. I've opened #2735 with only the vector_stores poll() fixes (clean PR with just the 4 lines changed).

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The return value of the vector_stores.file_batches.poll method contains the ID of the VectorStore.

1 participant

@yashwantbezawada