Testing Guide¶
This guide covers how to write and run tests for Morphix.
Test framework¶
Morphix uses pytest with pytest-asyncio in asyncio_mode = "auto". All async tests are automatically detected — no need for @pytest.mark.asyncio in theory, but we explicitly mark them for clarity and to avoid surprises with fixtures.
import pytest
@pytest.mark.asyncio
async def test_something_async():
result = await some_async_function()
assert result == "expected"
Mocking¶
Use unittest.mock for mocking. For async functions, use AsyncMock:
from unittest.mock import AsyncMock, MagicMock, patch
@pytest.mark.asyncio
async def test_with_mocks():
mock_service = AsyncMock(return_value="mocked result")
with patch("agents.service.AgentsService.execute_agent", mock_service):
from orchestration.orchestrator import some_function
result = await some_function("query")
assert "mocked result" in result
mock_service.assert_called_once()
Mocking pattern: inline mocks only¶
Do not add fixtures to conftest.py. Define all mocks inline in each test module. This keeps tests self-contained and avoids fixture-ordering issues.
# tests/test_my_feature.py
import pytest
from unittest.mock import AsyncMock, MagicMock, patch
@pytest.mark.asyncio
async def test_feature_a():
"""Each test defines its own mocks — no conftest.py fixtures."""
mock_db = AsyncMock()
mock_db.execute.return_value = MagicMock()
with patch("core.database.get_async_session", return_value=mock_db):
# test code here
pass
@pytest.mark.asyncio
async def test_feature_b():
"""Mocks are scoped to each test function."""
mock_llm = AsyncMock(return_value="response")
# ...
Testing tools¶
Always use a fresh ToolsRegistry() instance, not the global tools_registry:
from tools.registry import ToolsRegistry
@pytest.mark.asyncio
async def test_my_tool():
reg = ToolsRegistry()
@reg.register("my_tool")
async def my_tool(param: str = "default", **kwargs) -> str:
return f"got: {param}"
tool = reg.get_tool("my_tool")
assert tool is not None
result = await tool(param="hello")
assert result == "got: hello"
This pattern prevents test pollution — each test gets its own isolated registry.
Testing agents¶
Use AgentsRegistry() directly:
from agents.registry import AgentsRegistry
@pytest.mark.asyncio
async def test_agent_registration():
reg = AgentsRegistry()
reg.register_workspace_agent(
"test_agent",
AsyncMock(return_value="agent output"),
{"name": "test_agent", "type": "agent", "tools": []},
)
agent = reg.get_agent("test_agent")
assert agent is not None
profile = reg.get_profile("test_agent")
assert profile["type"] == "agent"
Testing WorkflowOrchestrator¶
Tests that import WorkflowOrchestrator need extensive internal patching. The orchestrator touches many subsystems (LLM, tools, database, agents). Use a helper fixture pattern within your test module:
import pytest
from unittest.mock import AsyncMock, MagicMock, patch
def mock_orchestrator_deps():
"""Create all mocks needed for WorkflowOrchestrator tests."""
return {
"TaskAnalyzer": AsyncMock(return_value={
"primary_type": "development",
"requires_full_orchestration": True,
}),
"decompose_task": AsyncMock(return_value=[
{"description": "Create app.py", "agent": "developer"},
{"description": "Write tests", "agent": "developer"},
]),
"agent_router": AsyncMock(return_value="developer"),
"WorkflowSupervisor": MagicMock(),
"execute_subtask_safe": AsyncMock(return_value={
"status": "completed",
"result": "file created",
"files_written": ["app.py"],
}),
"ResultAggregator": AsyncMock(return_value="Final aggregated result"),
"finalize_workflow": AsyncMock(),
"emit_system": AsyncMock(),
"emit_stats": AsyncMock(),
}
@pytest.mark.asyncio
async def test_full_orchestration():
mocks = mock_orchestrator_deps()
with (
patch("orchestration.workflows.orchestrator.TaskAnalyzer", mocks["TaskAnalyzer"]),
patch("orchestration.workflows.orchestrator.decompose_task", mocks["decompose_task"]),
patch("orchestration.workflows.orchestrator.agent_router", mocks["agent_router"]),
patch(
"orchestration.workflows.orchestrator.WorkflowSupervisor",
mocks["WorkflowSupervisor"],
),
patch(
"orchestration.workflows.orchestrator.execute_subtask_safe",
mocks["execute_subtask_safe"],
),
patch(
"orchestration.workflows.orchestrator.ResultAggregator",
mocks["ResultAggregator"],
),
patch(
"orchestration.workflows.orchestrator.finalize_workflow",
mocks["finalize_workflow"],
),
patch("orchestration.workflows.orchestrator.emit_system", mocks["emit_system"]),
patch("orchestration.workflows.orchestrator.emit_stats", mocks["emit_stats"]),
):
from orchestration.workflows.orchestrator import WorkflowOrchestrator
from orchestration.events import Session, WorkflowContext, WorkflowEvents
ctx = WorkflowContext(
query="Build a hello world app",
conversation_history=[],
workspace="main",
project_root="test_project",
)
mock_events = MagicMock(spec=WorkflowEvents)
session = Session(context=ctx, events=mock_events)
result = await WorkflowOrchestrator._run_full_orchestration(
query=ctx.query,
conversation_history=ctx.conversation_history,
task_analysis=mocks["TaskAnalyzer"].return_value,
ctx=ctx,
events=mock_events,
project_root=ctx.project_root,
workspace=ctx.workspace,
allowed_agents=["developer"],
workflow_allowed_tools=["file_manager"],
start_time=0,
)
assert result is not None
assert "Final aggregated result" in result
Running tests¶
# Run all tests
poetry run pytest
# Run a single test file
poetry run pytest tests/test_file_manager.py
# Run a single test function
poetry run pytest tests/test_file_manager.py::test_write_and_read
# Run tests matching a pattern
poetry run pytest -k "file_manager"
# Run with verbose output
poetry run pytest -v
# Run and stop on first failure
poetry run pytest -x
# Run only the last failed tests
poetry run pytest --lf
# Run with coverage report
poetry run pytest --cov=core --cov=llm --cov=agents --cov=tools --cov=orchestration --cov-report=term-missing
Coverage¶
Coverage runs on the five main source directories:
poetry run pytest \
--cov=core \
--cov=llm \
--cov=agents \
--cov=tools \
--cov=orchestration \
--cov-report=term-missing
Target: maintaining or improving coverage across all five directories.
Test structure¶
tests/
├── conftest.py # Per-test engine isolation only
├── test_file_manager.py # One file per module under test
├── test_hello_world.py # New tool tests
├── test_workflow_orchestrator.py # Orchestrator tests (heavy patching)
├── test_agent_loop.py # Agent loop tests
├── test_template_consistency.py # Guard tests for template validity
└── ...
Each test file maps roughly to one source module. The filename convention is test_<module_name>.py.
Template consistency tests¶
Always add a template consistency test when creating new agents, tools, or workflows:
def test_all_workflow_tools_exist():
"""Every tool referenced by a workflow template must be registered."""
from tools.specs import TOOL_DEFINITIONS
from core.path_resolver import paths
import yaml
workflows_dir = paths.templates_dir() / "workflows"
for wf_file in workflows_dir.glob("*.yaml"):
workflow = yaml.safe_load(wf_file.read_text())
allowed_tools = workflow.get("tools", {}).get("allowed", [])
for tool_name in allowed_tools:
assert tool_name in TOOL_DEFINITIONS, (
f"Tool '{tool_name}' in {wf_file.name} not found in TOOL_DEFINITIONS"
)
Known test considerations¶
test_development_route flake¶
Under full-suite load (~676 tests with function-scoped asyncio loops), test_development_route can intermittently raise OSError: [Errno 22]. This is an environmental artifact of pytest-asyncio creating/destroying event loops per test function — not a product bug. The test passes reliably in isolation. The DB engine has loop-hardening (core/database.py) that prevents cross-loop asyncpg reuse.
AsyncMock vs MagicMock¶
- Use
AsyncMockfor async functions and coroutines. - Use
MagicMockfor classes, instances, and sync functions. patch()auto-createsAsyncMockfor async targets when used as a context manager with async functions.
Test isolation between runs¶
The conftest.py contains a per-test engine reset fixture:
@pytest.fixture(autouse=True)
async def reset_engine_after_test():
yield
from core.database import dispose_engine
dispose_engine()
This ensures each test gets a fresh database engine, preventing cross-test connection pool contamination.