AI MathService Lab

projects, hypotheses, experiments, simulation jobs
health

Новый проект

Worker API

Локальный сервер будет забирать задачи через pull-модель.
POST /api/worker/claim
POST /api/worker/jobs/{id}/complete
Header: X-Worker-Token

OpenRouter

API key configured. Можно запускать тестовые агентские запросы.

Проекты

Проект Статус Гипотезы Эксперименты Очередь Результаты
MVP Smoke Project
mvp-smoke-1780175317
active
priority 90
1 1 0 queued / 0 running 1
MVP Smoke Project
mvp-smoke
active
priority 90
1 1 0 queued / 1 running 0
AI Feature Smoke
ai-feature-1780177760
active
priority 75
1 1 2 queued / 0 running 0

Последние simulation jobs

ID Проект Эксперимент Статус Seed Worker
1 MVP Smoke Project Smoke experiment running 42 smoke-worker
3 AI Feature Smoke Test impact of bonus frequency on RTP volatility queued 42
4 AI Feature Smoke Test impact of bonus frequency on RTP volatility queued 42
2 MVP Smoke Project Smoke experiment done 42 smoke-worker

Model profiles

354 models, last sync 2026-05-30 22:15:33.091695+00:00
ProfileModelTemperatureMax tokensPurpose
cheap_fast
balanced
strong_reasoning
critical

OpenRouter catalog

Model IDNameContextPromptCompletion
AI21: Jamba Large 1.7 256000 0.000002000000 0.000008000000
AionLabs: Aion-1.0 131072 0.000004000000 0.000008000000
AionLabs: Aion-1.0-Mini 131072 7.00000E-7 0.000001400000
AionLabs: Aion-2.0 131072 8.00000E-7 0.000001600000
AionLabs: Aion-RP 1.0 (8B) 32768 8.00000E-7 0.000001600000
AlfredPros: CodeLLaMa 7B Instruct Solidity 4096 8.00000E-7 0.000001200000
AllenAI: Olmo 3 32B Think 65536 1.50000E-7 5.00000E-7
Amazon: Nova 2 Lite 1000000 3.00000E-7 0.000002500000
Amazon: Nova Lite 1.0 300000 6.0000E-8 2.40000E-7
Amazon: Nova Micro 1.0 128000 3.5000E-8 1.40000E-7
Amazon: Nova Premier 1.0 1000000 0.000002500000 0.000012500000
Amazon: Nova Pro 1.0 300000 8.00000E-7 0.000003200000
Magnum v4 72B 32768 0.000003000000 0.000005000000
Anthropic: Claude 3.5 Haiku 200000 8.00000E-7 0.000004000000
Anthropic: Claude 3 Haiku 200000 2.50000E-7 0.000001250000
Anthropic: Claude Haiku 4.5 200000 0.000001000000 0.000005000000
Anthropic Claude Haiku Latest 200000 0.000001000000 0.000005000000
Anthropic: Claude Opus 4 200000 0.000015000000 0.000075000000
Anthropic: Claude Opus 4.1 200000 0.000015000000 0.000075000000
Anthropic: Claude Opus 4.5 200000 0.000005000000 0.000025000000
Anthropic: Claude Opus 4.6 1000000 0.000005000000 0.000025000000
Anthropic: Claude Opus 4.6 (Fast) 1000000 0.000030000000 0.000150000000
Anthropic: Claude Opus 4.7 1000000 0.000005000000 0.000025000000
Anthropic: Claude Opus 4.7 (Fast) 1000000 0.000030000000 0.000150000000
Anthropic: Claude Opus 4.8 1000000 0.000005000000 0.000025000000
Anthropic: Claude Opus 4.8 (Fast) 1000000 0.000010000000 0.000050000000
Anthropic: Claude Opus Latest 1000000 0.000005000000 0.000025000000
Anthropic: Claude Sonnet 4 1000000 0.000003000000 0.000015000000
Anthropic: Claude Sonnet 4.5 1000000 0.000003000000 0.000015000000
Anthropic: Claude Sonnet 4.6 1000000 0.000003000000 0.000015000000
Anthropic Claude Sonnet Latest 1000000 0.000003000000 0.000015000000
Arcee AI: Coder Large 32768 5.00000E-7 8.00000E-7
Arcee AI: Maestro Reasoning 131072 9.00000E-7 0.000003300000
Arcee AI: Spotlight 131072 1.80000E-7 1.80000E-7
Arcee AI: Trinity Large Thinking 262144 2.20000E-7 8.50000E-7
Arcee AI: Trinity Mini 131072 4.5000E-8 1.50000E-7
Arcee AI: Virtuoso Large 131072 7.50000E-7 0.000001200000
Baidu: ERNIE 4.5 21B A3B 131072 7.0000E-8 2.80000E-7
Baidu: ERNIE 4.5 21B A3B Thinking 131072 7.0000E-8 2.80000E-7
Baidu: ERNIE 4.5 300B A47B 131072 2.80000E-7 0.000001100000
Baidu: ERNIE 4.5 VL 28B A3B 131072 1.40000E-7 5.60000E-7
Baidu: ERNIE 4.5 VL 424B A47B 131072 4.20000E-7 0.000001250000
ByteDance Seed: Seed 1.6 262144 2.50000E-7 0.000002000000
ByteDance Seed: Seed 1.6 Flash 262144 7.5000E-8 3.00000E-7
ByteDance Seed: Seed-2.0-Lite 262144 2.50000E-7 0.000002000000
ByteDance Seed: Seed-2.0-Mini 262144 1.00000E-7 4.00000E-7
ByteDance: UI-TARS 7B 128000 1.00000E-7 2.00000E-7
Venice: Uncensored (free) 32768
Cohere: Command A 256000 0.000002500000 0.000010000000
Cohere: Command R (08-2024) 128000 1.50000E-7 6.00000E-7
Cohere: Command R7B (12-2024) 128000 3.7500E-8 1.50000E-7
Cohere: Command R+ (08-2024) 128000 0.000002500000 0.000010000000
Deep Cogito: Cogito v2.1 671B 128000 0.000001250000 0.000001250000
DeepSeek: DeepSeek V3 131072 2.28800E-7 9.14400E-7
DeepSeek: DeepSeek V3 0324 163840 2.00000E-7 7.70000E-7
DeepSeek: DeepSeek V3.1 163840 2.10000E-7 7.90000E-7
DeepSeek: R1 163840 7.00000E-7 0.000002500000
DeepSeek: R1 0528 163840 5.00000E-7 0.000002150000
DeepSeek: R1 Distill Llama 70B 131072 7.00000E-7 8.00000E-7
DeepSeek: R1 Distill Qwen 32B 128000 2.90000E-7 2.90000E-7
DeepSeek: DeepSeek V3.1 Terminus 163840 2.70000E-7 9.50000E-7
DeepSeek: DeepSeek V3.2 131072 2.52000E-7 3.78000E-7
DeepSeek: DeepSeek V3.2 Exp 163840 2.70000E-7 4.10000E-7
DeepSeek: DeepSeek V4 Flash 1048576 9.8300E-8 1.96600E-7
DeepSeek: DeepSeek V4 Flash (free) 1048576
DeepSeek: DeepSeek V4 Pro 1048576 4.35000E-7 8.70000E-7
EssentialAI: Rnj 1 Instruct 32768 1.50000E-7 1.50000E-7
Google: Gemini 2.0 Flash 1048576 1.00000E-7 4.00000E-7
Google: Gemini 2.0 Flash Lite 1048576 7.5000E-8 3.00000E-7
Google: Gemini 2.5 Flash 1048576 3.00000E-7 0.000002500000
Google: Nano Banana (Gemini 2.5 Flash Image) 32768 3.00000E-7 0.000002500000
Google: Gemini 2.5 Flash Lite 1048576 1.00000E-7 4.00000E-7
Google: Gemini 2.5 Flash Lite Preview 09-2025 1048576 1.00000E-7 4.00000E-7
Google: Gemini 2.5 Pro 1048576 0.000001250000 0.000010000000
Google: Gemini 2.5 Pro Preview 06-05 1048576 0.000001250000 0.000010000000
Google: Gemini 2.5 Pro Preview 05-06 1048576 0.000001250000 0.000010000000
Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview) 131072 5.00000E-7 0.000003000000
Google: Gemini 3.1 Flash Lite 1048576 2.50000E-7 0.000001500000
Google: Gemini 3.1 Flash Lite Preview 1048576 2.50000E-7 0.000001500000
Google: Gemini 3.1 Pro Preview 1048576 0.000002000000 0.000012000000

Agent tasks

IDProjectTypeProfileStatusError
7 AI Feature Smoke coding_plan cheap_fast done
6 AI Feature Smoke coding_plan cheap_fast failed Expecting ',' delimiter: line 42 column 164 (char 3657)
5 AI Feature Smoke coding_plan cheap_fast failed Expecting ',' delimiter: line 43 column 170 (char 3802)
4 AI Feature Smoke coding_plan cheap_fast failed Expecting ',' delimiter: line 41 column 173 (char 3654)
3 AI Feature Smoke coding_plan cheap_fast failed Expecting ',' delimiter: line 114 column 6 (char 3806)
2 AI Feature Smoke plan_experiment cheap_fast done
1 AI Feature Smoke generate_hypotheses cheap_fast done

Agent runs

IDProjectProfileModelTokensResult
4 AI Feature Smoke cheap_fast openai/gpt-4.1-mini-2025-04-14 649 / 1423
{
  "title": "Python CLI for Simulation Results Summary",
  "requirements": [
    {"id": "R1", "text": "Read simulation results from a CSV file", "priority": "must"},
    {"id": "R2", "text": "Compute statistical summaries (mean, standard deviation, quantiles) for selected metrics", "priority": "must"},
    {"id": "R3", "text": "Export the computed summary statistics to a JSON file", "priority": "must"},
    {"id": "R4", "text": "Provide a command-line interface to specify input CSV, output JSON, and metrics to summarize", "priority": "must"},
    {"id": "R5", "text": "Include unit tests covering CSV reading, statistics computation, and JSON export", "priority": "must"},
    {"id": "R6", "text": "Handle invalid input files or missing metrics gracefully with clear error messages", "priority": "should"},
    {"id": "R7", "text": "Allow configuration of quantile levels via CLI or config", "priority": "could"}
  ],
  "architecture": {
    "summary": "Modular CLI utility that reads CSV data, computes statistics on selected metrics, and writes a JSON summary output.",
    "components": [
      {"name": "CSVReader", "responsibility": "Load and validate CSV data from file"},
      {"name": "StatisticsCalculator", "responsibility": "Compute mean, std, and quantiles for specified metrics"},
      {"name": "JSONExporter", "responsibility": "Serialize computed statistics into JSON format and write to file"},
      {"name": "CLIHandler", "responsibility": "Parse command line arguments and coordinate workflow"},
      {"name": "ErrorHandler", "responsibility": "Manage errors and provide user-friendly messages"}
    ],
    "data_flow": [
      "CLIHandler parses input arguments",
      "CSVReader loads CSV data",
      "StatisticsCalculator computes summaries for selected metrics",
      "JSONExporter writes summary JSON output",
      "CLIHandler reports success or errors"
    ]
  },
  "files": [
    {"path": "cli.py", "purpose": "Entry point and CLI argument parsing", "main_symbols": ["main", "parse_args"]},
    {"path": "csv_reader.py", "purpose": "Load and validate CSV simulation results", "main_symbols": ["CSVReader"]},
    {"path": "statistics.py", "purpose": "Compute statistical summaries for metrics", "main_symbols": ["StatisticsCalculator"]},
    {"path": "json_exporter.py", "purpose": "Export computed statistics to JSON file", "main_symbols": ["JSONExporter"]},
    {"path": "errors.py", "purpose": "Define custom exceptions and error handling utilities", "main_symbols": ["InvalidInputError"]},
    {"path": "tests/test_csv_reader.py", "purpose": "Unit tests for CSV reading and validation", "main_symbols": ["test_csv_loading", "test_invalid_csv"]},
    {"path": "tests/test_statistics.py", "purpose": "Unit tests for statistics computations", "main_symbols": ["test_mean_std_quantiles"]},
    {"path": "tests/test_json_exporter.py", "purpose": "Unit tests for JSON export functionality", "main_symbols": ["test_json_output"]}
  ],
  "implementation_steps": [
    {"step": 1, "title": "Setup project structure and dependencies", "details": "Create project folders, initialize Python environment, add requirements (e.g. pandas, numpy)"},
    {"step": 2, "title": "Implement CSVReader", "details": "Develop CSVReader class to load CSV, validate presence of required columns, and handle errors"},
    {"step": 3, "title": "Implement StatisticsCalculator", "details": "Create functions to compute mean, std deviation, and configurable quantiles for given metrics"},
    {"step": 4, "title": "Implement JSONExporter", "details": "Serialize statistics dictionary to JSON and write to output file"},
    {"step": 5, "title": "Implement CLIHandler", "details": "Use argparse to parse input CSV path, output JSON path, and metrics list; invoke components in order"},
    {"step": 6, "title": "Add error handling and user messages", "details": "Catch and report errors such as file not found, missing metrics, or parse errors"},
    {"step": 7, "title": "Write unit tests", "details": "Cover CSV reading, statistics calculations, and JSON export with representative test cases"},
    {"step": 8, "title": "Manual testing and validation", "details": "Run CLI with sample CSV files, verify JSON output correctness and error handling"},
    {"step": 9, "title": "Documentation and usage instructions", "details": "Document CLI usage, input/output formats, and configuration options"}
  ],
  "test_plan": [
    {"type": "unit", "target": "CSVReader", "cases": ["valid CSV loads correctly", "missing columns raises error", "empty file handling"]},
    {"type": "unit", "target": "StatisticsCalculator", "cases": ["correct mean/std for sample data", "quantiles computed accurately", "handle empty data gracefully"]},
    {"type": "unit", "target": "JSONExporter", "cases": ["valid JSON output format", "file write success", "handle write permission errors"]},
    {"type": "integration", "target": "CLI end-to-end", "cases": ["valid input produces correct JSON", "invalid input shows error message", "metrics selection works as expected"]},
    {"type": "manual", "target": "CLI usability", "cases": ["help message clarity", "error messages user-friendly", "performance on large CSV files"]}
  ],
  "risks": [
    {"risk": "Input CSV files may have inconsistent formatting or missing data", "mitigation": "Implement robust validation and clear error messages; provide example CSV format"},
    {"risk": "Large CSV files may cause high memory usage or slow processing", "mitigation": "Use efficient libraries like pandas; consider streaming if needed"},
    {"risk": "User may specify metrics not present in CSV", "mitigation": "Validate metric names early and inform user with suggestions"},
    {"risk": "Quantile calculation may be misunderstood or misconfigured", "mitigation": "Document quantile defaults and allow user configuration with validation"}
  ],
  "open_questions": [
    "What are the exact metric names expected in the CSV and their data types?",
    "Should quantile levels be fixed or configurable via CLI?",
    "Is there a preferred JSON schema or structure for the summary output?",
    "Are there any performance constraints or expected CSV file sizes?"
  ]
}
3 AI Feature Smoke cheap_fast openai/gpt-4.1-mini-2025-04-14 484 / 217
```json
{
  "title": "Test impact of bonus frequency on RTP volatility",
  "goal": "Evaluate if increasing bonus frequency reduces RTP volatility under fixed CPU budget",
  "parameter_space": {
    "bonus_frequency": [0.05, 0.10, 0.20]
  },
  "success_criteria": {
    "primary_metric": "RTP_volatility",
    "decision_rule": "RTP_volatility decreases as bonus_frequency increases"
  },
  "budget": {
    "max_jobs": 3,
    "max_runtime_seconds": 3600
  },
  "suggested_jobs": [
    {
      "seed": 42,
      "priority": 70,
      "parameters": {"bonus_frequency": 0.05},
      "timeout_seconds": 1200
    },
    {
      "seed": 42,
      "priority": 60,
      "parameters": {"bonus_frequency": 0.20},
      "timeout_seconds": 1200
    }
  ]
}
```
2 AI Feature Smoke cheap_fast openai/gpt-4.1-mini-2025-04-14 226 / 88
```json
{
  "hypotheses": [
    {
      "title": "Higher Bonus Frequency Reduces RTP Volatility",
      "description": "Increasing the frequency of bonus events in the game math model will decrease RTP volatility by providing more frequent smaller wins, which may stabilize player experience under a fixed CPU budget.",
      "confidence": 0.35,
      "parent_id": null
    }
  ]
}
```
1 MVP Smoke Project cheap_fast openai/gpt-4.1-mini-2025-04-14 54 / 65
1. Вероятность выигрыша для всех стратегий примерно равна при длительной игре.  
2. Средний выигрыш игрока близок к нулю при большом числе раундов.  
3. Изменение параметров модели не приводит к систематическому преимуществу одной из сторон.