microsoft · Prajwal-Microsoft · May 6, 2026 · Apr 28, 2026 · Apr 28, 2026 · Apr 28, 2026
@@ -51,4 +51,4 @@ For the complete DSL reference, expression language, domain adaptation examples,
 
 ## Schema-Specific Prompts
 
-Schema-specific prompts are managed directly in the individual schema .py file that is created. The field descriptions in your schema class act as prompts for the LLM during data extraction and mapping. See [Customizing Schema and Data](./CustomizeSchemaData.md) for details on how to write effective field descriptions.
+Schema-specific prompts are managed directly in the individual schema JSON file. The field descriptions in your schema act as prompts for the LLM during data extraction and mapping. See [Customizing Schema and Data](./CustomizeSchemaData.md) for details on how to write effective field descriptions.
@@ -121,7 +121,7 @@ The final stage applies **YAML-based rules** to detect missing documents and cro
 
 1. **Create Custom Schema**
    - Follow the [Custom Schema Guide](./CustomizeSchemaData.md)
-   - Define your document structure and required fields (Pydantic model)
+   - Define your document structure and required fields (JSON Schema)
 
 2. **Register Your Schema**
    - Add your schema to `schema_info.json` and run `register_schema.py`

@@ -209,7 +209,7 @@ Using Azure OpenAI Service, a deployment of the GPT-5.1 model is used during the
 Using Azure Blob Storage, the solution uses multiple containers:
 - **process-batch** – Claim batch manifests and batch-level artifacts.
 - **cps-processes** – Source files for processing, intermediate results, and final output JSON files.
-- **cps-configuration** – Schema `.py` files and configuration data.
+- **cps-configuration** – Schema JSON files and configuration data.
 
 ### Azure Cosmos DB for MongoDB
 Using Azure Cosmos DB for MongoDB, the solution uses multiple collections:

@@ -124,6 +124,15 @@ if (-not $ApiReady) {
 
         Write-Host "  Registering new schema '$ClassName'..."
 
+        # Only JSON Schema descriptors are accepted. The legacy .py format
+        # was removed as part of the schemavault RCE remediation.
+        $extension = [System.IO.Path]::GetExtension($SchemaFile).ToLowerInvariant()
+        if ($extension -ne '.json') {
+            Write-Host "  Unsupported schema extension '$extension' for '$SchemaFile'. Only .json is accepted. Skipping..."
+            continue
+        }
+        $contentType = 'application/json'
+
         # Build multipart form data
         $dataPayload = @{ ClassName = $ClassName; Description = $Description } | ConvertTo-Json -Compress
         $fileBytes   = [System.IO.File]::ReadAllBytes($SchemaFile)
@@ -137,7 +146,7 @@ if (-not $ApiReady) {
             $dataPayload,
             "--$boundary",
             "Content-Disposition: form-data; name=`"file`"; filename=`"$fileName`"",
-            "Content-Type: text/x-python$LF",
+            "Content-Type: $contentType$LF",
             [System.Text.Encoding]::UTF8.GetString($fileBytes),
             "--$boundary--$LF"
         ) -join $LF

@@ -136,10 +136,19 @@ else
     echo "  Registering new schema '$CLASS_NAME'..."
     DATA_PAYLOAD="{\"ClassName\": \"$CLASS_NAME\", \"Description\": \"$DESCRIPTION\"}"
 
+    # Only JSON Schema descriptors are accepted. The legacy .py format
+    # was removed as part of the schemavault RCE remediation.
+    EXT=$(echo "${FILE_NAME##*.}" | tr '[:upper:]' '[:lower:]')
+    if [ "$EXT" != "json" ]; then
+      echo "  Unsupported schema extension '.$EXT' for '$FILE_NAME'. Only .json is accepted. Skipping..."
+      continue
+    fi
+    CONTENT_TYPE="application/json"
+
     RESPONSE=$(curl -s -w "\n%{http_code}" \
       -X POST "$SCHEMAVAULT_URL" \
       -F "data=$DATA_PAYLOAD" \
-      -F "file=@$SCHEMA_FILE;type=text/x-python" \
+      -F "file=@$SCHEMA_FILE;type=$CONTENT_TYPE" \
       --connect-timeout 60)
 
     HTTP_CODE=$(echo "$RESPONSE" | tail -1)

@@ -9,7 +9,7 @@ class file (in blob storage) that defines the structured output
 """
 
 import datetime
-from typing import Optional
+from typing import Literal, Optional
 
 from pydantic import BaseModel, Field
 
@@ -21,17 +21,21 @@ class Schema(BaseModel):
 
     Attributes:
         Id: Unique schema identifier.
-        ClassName: Python class name in the remote module.
+        ClassName: Class name to materialise from the schema artifact.
         Description: Human-readable description.
-        FileName: Blob filename containing the schema class.
+        FileName: Blob filename containing the schema artifact.
         ContentType: Target content type this schema handles.
+        Format: Storage format of the schema artifact. Always
+            ``"json"`` — declarative JSON Schema descriptors are the
+            only supported format.
     """
 
     Id: str
     ClassName: str
     Description: str
     FileName: str
     ContentType: str
+    Format: Literal["json"] = Field(default="json")
     Created_On: Optional[datetime.datetime] = Field(default=None)
     Updated_On: Optional[datetime.datetime] = Field(default=None)
 

@@ -28,7 +28,7 @@
 from libs.pipeline.entities.pipeline_step_result import StepResult
 from libs.pipeline.entities.schema import Schema
 from libs.pipeline.queue_handler_base import HandlerBase
-from libs.utils.remote_module_loader import load_schema_from_blob
+from libs.utils.remote_schema_loader import load_schema_from_blob_json
 
 logger = logging.getLogger(__name__)
 
@@ -151,12 +151,21 @@ async def execute(self, context: MessageContext) -> StepResult:
             schema_id=context.data_pipeline.pipeline_status.schema_id,
         )
 
-        # Load the schema class for structured output
-        schema_class = load_schema_from_blob(
+        # Load the schema class for structured output. Only JSON schemas
+        # are supported; the worker materialises the descriptor as an
+        # in-memory Pydantic model without ever executing uploaded code.
+        if not selected_schema.FileName.lower().endswith(".json"):
+            raise ValueError(
+                f"Schema {selected_schema.Id} has a non-JSON file "
+                f"'{selected_schema.FileName}'. Re-register the schema as a "
+                "JSON Schema (.json) document; legacy Python (.py) schemas "
+                "are no longer supported."
+            )
+        schema_class = load_schema_from_blob_json(
             account_url=self.application_context.configuration.app_storage_blob_url,
             container_name=f"{self.application_context.configuration.app_cps_configuration}/Schemas/{context.data_pipeline.pipeline_status.schema_id}",
             blob_name=selected_schema.FileName,
-            module_name=selected_schema.ClassName,
+            model_name=selected_schema.ClassName,
         )
 
         # Invoke Model with Agent Framework SDK

@@ -8,8 +8,8 @@
     base64_util: Base-64 encoding detection.
     credential_util: Convenience re-export of credential and token-provider
         helpers (mirrors azure_credential_utils).
-    remote_module_loader: Dynamically load Python modules from Azure Blob
-        Storage.
+    remote_schema_loader: Materialise Pydantic models from JSON Schema
+        descriptors stored in Azure Blob Storage (no code execution).
     stopwatch: Lightweight elapsed-time measurement context manager.
     utils: General-purpose JSON encoding, dict flattening, and value
         comparison helpers.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -51,4 +51,4 @@ For the complete DSL reference, expression language, domain adaptation examples,

		## Schema-Specific Prompts

		Schema-specific prompts are managed directly in the individual schema .py file that is created. The field descriptions in your schema class act as prompts for the LLM during data extraction and mapping. See [Customizing Schema and Data](./CustomizeSchemaData.md) for details on how to write effective field descriptions.
		Schema-specific prompts are managed directly in the individual schema JSON file. The field descriptions in your schema act as prompts for the LLM during data extraction and mapping. See [Customizing Schema and Data](./CustomizeSchemaData.md) for details on how to write effective field descriptions.