Skip to main content

Eyeson AI Adapter

The Eyeson AI Adapter consists of 2 parts:

  1. The endpoint group forward stream
  2. A customer AI pipeline

AI adapter example schemaAI adapter example schema

In the above example a POST is sent to the endpoint /forward/source. It triggers the forwarding of the stream Source 3 to a URL defined in the command. The stream gets forwarded as it is.

At the AI ingress you need to get the image and/or audio of the stream to manipulate it for the AI pipeline.

tip

Remember that different models take different input parameters. One image model might accept a square image with 150 pixels, another could take a 320 x 160.

At the end of the AI pipeline you can then set actions, which could be sent to the API as commands.

  • Triggering an alert and setting an overlay.
  • Changing the layout and moving the focus to a certain stream.
  • Writing a transcript to a file
  • Adding a screen capture to a report
  • a.s.o.

Depending on your bandwidth you can send multiple streams to one AI pipeline or one stream to multiple pipelines. This allows a mix and match for fitting intelligence for your sources.

danger

If a source stream stops delivering, the forward command gets terminated. So you need to restart it to keep it alive by means of the ingress.

AI Integration Models

Modern video communication systems are increasingly incorporating AI capabilities to enhance functionality and user experience. This section examines different approaches to AI integration, from device-level implementation to sophisticated multi-AI architectures.

Different Stages for AI Analytics in Video Communication

Stage 1: Compressed Video

Here the AI preselects sources to put into the next stage. Example: Motion detection selects potential candidate sources automatically.

Stage 2: Uncompressed Video

The AI narrows the selection by further analyzing the content. Example: Object detection further prioritizes streams.

Stage 3: Composed Video

Multiple synchronized sources are merged into one composed uncompressed real-time stream. This provides a seamless view of the scene by integrating feeds such as Drone video, body-mounted cameras, and collaboration cameras. The AI can analyze the composed video and see information in context. Example: Data from multiple sensors is synchronized to capture the scene context.

Stage 4: Distributed Interactive Video

Human operators, AI and video sources (like drones, bodycams, sensors, and so on) can interact and orient. Example: The Intelligent system loops in human operators (Human in the Loop) as necessary.

How AI is connected with Humans and external sources like drones

Advanced Implementation: Edge MCU with Multiple AIs

The most sophisticated implementation involves an Edge MCU architecture that integrates multiple AI systems. This model provides several key advantages:

  1. Distributed AI Processing
  2. Real-time Stream Processing
  3. Intelligent Content Routing
  4. Enhanced Collaboration Features

The architecture follows this structure: AI Model with edge MCU and AIs

This configuration allows for:

  • Parallel AI processing
  • Intelligent stream filtering
  • Dynamic layout switching
  • Real-time content enrichment
  • Immediate collaboration integration

The multi-AI approach provides greater flexibility and functionality compared to single-AI implementations, while maintaining the efficiency benefits of edge computing.

AI Model suggestions for Edge AI

Here's a table giving you ideas on your options for Edge AI:

DatasetsExamples of ModelsUse Cases and benefits
Compressed VideoMobileNet Lightweight motion detection ● Coviar action RecognitionReal Time Actions or motions detection ● Streams filtering and prioritization -> Layout API ● Smaller dataset for lower CPU and Energy consumption
Decompressed VideoYOLO - Object Detection ● C3D - Motion and object detection ● I3D - Action Recognition ● ViT - Image classification ● CLIP - Image captioningReal Time Complex actions detection ● Streams filtering and prioritization -> Layout API Real Time Complex objects detection ● Streams filtering and prioritization -> Layout API ● Content enrichment (visual tagging, image captioning) ● STAGED AI: Streams Routing to cloud LLM for additional analysis
Composed VideoHMFNetV-JEPA - Layouted Video + Audio + Metadata. One composed uncompressed real time stream of multiple synchronised sourcesReal Time Scene understanding ● Predictions and recommendations -> Event predictions, live recommendations, automated additional content collection or suggestions (updated maps, additional sources) from LLMs ● Streams Routing for escalation to LLMS Real Time Data quality augmentation ● X synchronised sources: X videos with different angles in 1 video stream (targeting)
Individual or mixed Audio● NLP Models: ● STTTTSSentiment AnalysisReal Time Content Enrichment ● Voice Transcription, Translation, Sentiment analysis, sounds analysis Layout Piloting and streams routing ● Voice activation, Key words
  • /forward/stream
    This endpoint allows the forwarding of one source stream to an AI ingress.
  • /forward/mcu
    This endpoint forwards the whole One View, you would use this for transcription or reasoning.
  • /forward/playback
    This endpoint allows the forwarding of a playback.