Skip to content

Conversation

Copy link
Contributor

CopilotAI commented Dec 11, 2025

Summary

Adds quickstart documentation for Microsoft.Extensions.DataIngestion library, demonstrating complete ETL pipeline for RAG scenarios.

Contributes to #50534

Changes

Documentation

  • New quickstart: docs/ai/quickstarts/process-data.md
    • Document reading with MarkdownReader
    • AI-powered enrichment (image alt-text, summaries)
    • Semantic chunking with embedding-based similarity
    • Vector storage using SQLite
    • Interactive semantic search
  • Dual-platform support via zone pivots (OpenAI/Azure OpenAI)
  • Added to "Chat with your data (RAG)" section in TOC

Code Snippets

  • Complete compilable C# projects for both platforms
  • Demonstrates pipeline composition: reader → enricher → chunker → writer
  • Includes sample data and region markers for doc references
// Compose data ingestion pipelineusingIngestionPipeline<string>pipeline=new(reader,chunker,writer,loggerFactory){DocumentProcessors={imageAlternativeTextEnricher},ChunkProcessors={summaryEnricher}};awaitforeach(varresultinpipeline.ProcessAsync(newDirectoryInfo("./data"),searchPattern:"*.md")){Console.WriteLine($"Completed processing '{result.DocumentId}'. Succeeded: '{result.Succeeded}'.");}

Based on sample from https://github.com/luisquintanilla/DataIngestion and blog announcement at https://devblogs.microsoft.com/dotnet/introducing-data-ingestion-building-blocks-preview/

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • devblogs.microsoft.com
    • Triggering command: /usr/bin/curl curl -s REDACTED (dns block)
    • Triggering command: /usr/bin/wget wget -q -O /tmp/blog.html REDACTED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Add a quickstart or tutorial for processing custom data based on the data ingestion sample described in https://devblogs.microsoft.com/dotnet/introducing-data-ingestion-building-blocks-preview/. It should live under the "Chat with your data" section of the AI TOC and be titled "Process data" or something similar. This task contributes to #50534.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.


Internal previews

📄 File🔗 Preview link
docs/ai/quickstarts/process-data.mdProcess custom data for AI applications
docs/ai/quickstarts/snippets/process-data/data/sample.mddocs/ai/quickstarts/snippets/process-data/data/sample
docs/ai/quickstarts/structured-output.mdRequest a response with structured output
docs/ai/quickstarts/text-to-image.mdQuickstart - Generate images from text using AI
docs/ai/toc.ymldocs/ai/toc

CopilotAI changed the title [WIP] Add quickstart tutorial for processing custom dataAdd data ingestion quickstart for processing custom dataDec 11, 2025
CopilotAI requested a review from gewarrenDecember 11, 2025 23:42
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

@gewarren