directed acyclic graphs (dags)
The Streams Engine in the Instinct API uses Directed Acyclic Graphs (DAGs) as the fundamental structure for organizing data processing pipelines.
what is a dag (stream) in the instinct api?
In the Instinct API, streams are implemented as directed acyclic graphs (DAGs) - a special type of graph where:
- Data flows in one direction (directed)
- There are no loops or cycles (acyclic)
This structure makes DAGs ideal for representing data processing pipelines, where data moves from source nodes through processing nodes to sink nodes, with each node performing a specific operation.
key properties in stream processing
directed
The "directed" property means that data flows in a specific direction, from upstream nodes to downstream nodes. In practical terms:
- Each connection (pipe) has a clear source and destination
- Data only flows in one direction along each pipe
- The direction of flow dictates the order of operations
acyclic
The "acyclic" property means that data never flows back to a node that has already processed it. This prevents infinite loops and ensures that processing eventually completes.
dag implementation in the instinct api
components
In the Instinct API's implementation of DAGs:
- Nodes: The processing units that perform operations on data
- Pipes: The connections between nodes that define how data flows
- Ports: The interfaces on nodes where pipes connect (input or output)
- Stream: The complete DAG containing all nodes and pipes
representation in stream configuration
The Instinct API represents DAGs through node and pipe definitions in stream configurations:
{
"nodes": [
{
"id": "eeg-source",
"executable": "headset-reader",
"config": {
"channels": ["Fp1", "Fp2", "F3", "F4"],
"sampleRate": 250
}
},
{
"id": "filter",
"executable": "signal-filter",
"config": {
"highpass": 0.5,
"lowpass": 50,
"notch": 60
}
},
{
"id": "analyzer",
"executable": "frequency-analyzer",
"config": {
"bands": ["alpha", "beta", "theta", "delta"],
"windowSize": 256
}
},
{
"id": "visualizer",
"executable": "data-visualizer",
"config": {
"displayMode": "spectral",
"refreshRate": 10
}
}
],
"pipes": [
{ "id": "raw-data", "source": "eeg-source", "target": "filter" },
{ "id": "filtered-data", "source": "filter", "target": "analyzer" },
{ "id": "analysis-results", "source": "analyzer", "target": "visualizer" }
]
}
dag operations in the instinct api
topological sorting
The Instinct API uses topological sorting to determine the execution sequence for nodes in a DAG. This ensures:
- Nodes process data only after all upstream dependencies have completed
- The execution order respects the data dependencies defined by pipes
- The system can efficiently schedule parallel execution where dependencies allow
dag validation
Before executing a stream, the Instinct API validates the DAG structure to ensure:
- All node references in pipes exist in the stream definition
- No cycles are present that would create infinite processing loops
- All nodes are connected (no isolated nodes without pipes)
- Input and output port connections between nodes are compatible
stream processing patterns
linear pipelines
The simplest DAG structure in the Instinct API is a linear pipeline where each node processes data sequentially:
EEG Source → Signal Filter → Feature Extractor → Data Storage
branching workflows
The Instinct API supports branching to process data in parallel paths:
→ Alpha Band Analyzer →
EEG Source → Filter → Results Merger → Visualizer
→ Beta Band Analyzer →
merging paths
Multiple processing paths can merge at a node that combines results:
EEG Channel 1 →
→ Signal Combiner → Feature Extraction → Classifier
EEG Channel 2 →
conditional routing
Data can be routed conditionally to different processing paths based on signal properties:
→ Artifact Removal → Reintegration →
EEG Source → QA → Analysis
→ Clean Signal ------------------------>
stream execution
The execution of a DAG in the Instinct API involves:
- Initialization: Setting up each node based on configuration parameters
- Data Propagation: Transferring data packets through pipes from sources to sinks
- Concurrent Processing: Running independent node operations in parallel when possible
- Termination: Proper shutdown of all nodes when processing completes
best practices for stream design
When designing DAGs for the Instinct API:
- Optimize for Data Flow: Design stream topologies that minimize unnecessary data transformations
- Node Granularity: Create focused nodes with single responsibilities rather than complex multi-function nodes
- Error Handling: Include error handling paths to manage processing failures gracefully
- Monitoring Points: Add monitoring nodes at key points to observe pipeline performance
- Resource Allocation: Consider CPU and memory requirements when designing parallel processing paths
- Pipeline Documentation: Document the purpose and requirements of your stream design
common stream processing patterns
signal processing pipeline
EEG Source → Filtering → Artifact Removal → Feature Extraction → Classification
real-time monitoring
Headset Data → Signal Processor → Analyzer → Alert System
→ Visualization Dashboard
data collection and analysis
EEG Data → Preprocessor → Feature Extractor → Machine Learning Model → Results
→ Raw Data Storage
next steps
- Learn about Streams to understand the complete pipeline concept
- Explore Nodes to see the different processing components available
- Understand Pipes to master data flow connections between nodes
- Follow our Basic Pipeline Tutorial to build your first stream