AI MathService Lab

Новый проект

Worker API

Локальный сервер будет забирать задачи через pull-модель.

POST /api/worker/claim
POST /api/worker/jobs/{id}/complete
Header: X-Worker-Token

OpenRouter

API key configured. Можно запускать тестовые агентские запросы.

Проекты

Проект	Статус	Гипотезы	Эксперименты	Очередь	Результаты
MVP Smoke Project mvp-smoke-1780175317	active priority 90	1	1	0 queued / 0 running	1
MVP Smoke Project mvp-smoke	active priority 90	1	1	0 queued / 1 running	0
AI Feature Smoke ai-feature-1780177760	active priority 75	1	1	2 queued / 0 running	0

Последние simulation jobs

ID	Проект	Эксперимент	Статус	Seed	Worker
1	MVP Smoke Project	Smoke experiment	running	42	smoke-worker
3	AI Feature Smoke	Test impact of bonus frequency on RTP volatility	queued	42
4	AI Feature Smoke	Test impact of bonus frequency on RTP volatility	queued	42
2	MVP Smoke Project	Smoke experiment	done	42	smoke-worker

Model profiles

Profile	Model	Temperature	Max tokens	Purpose
cheap_fast	enabled
balanced	enabled
strong_reasoning	enabled
critical	enabled

OpenRouter catalog

Name	Context	Prompt	Completion
AI21: Jamba Large 1.7	256000	0.000002000000	0.000008000000
AionLabs: Aion-1.0	131072	0.000004000000	0.000008000000
AionLabs: Aion-1.0-Mini	131072	7.00000E-7	0.000001400000
AionLabs: Aion-2.0	131072	8.00000E-7	0.000001600000
AionLabs: Aion-RP 1.0 (8B)	32768	8.00000E-7	0.000001600000
AlfredPros: CodeLLaMa 7B Instruct Solidity	4096	8.00000E-7	0.000001200000
AllenAI: Olmo 3 32B Think	65536	1.50000E-7	5.00000E-7
Amazon: Nova 2 Lite	1000000	3.00000E-7	0.000002500000
Amazon: Nova Lite 1.0	300000	6.0000E-8	2.40000E-7
Amazon: Nova Micro 1.0	128000	3.5000E-8	1.40000E-7
Amazon: Nova Premier 1.0	1000000	0.000002500000	0.000012500000
Amazon: Nova Pro 1.0	300000	8.00000E-7	0.000003200000
Magnum v4 72B	32768	0.000003000000	0.000005000000
Anthropic: Claude 3.5 Haiku	200000	8.00000E-7	0.000004000000
Anthropic: Claude 3 Haiku	200000	2.50000E-7	0.000001250000
Anthropic: Claude Haiku 4.5	200000	0.000001000000	0.000005000000
Anthropic Claude Haiku Latest	200000	0.000001000000	0.000005000000
Anthropic: Claude Opus 4	200000	0.000015000000	0.000075000000
Anthropic: Claude Opus 4.1	200000	0.000015000000	0.000075000000
Anthropic: Claude Opus 4.5	200000	0.000005000000	0.000025000000
Anthropic: Claude Opus 4.6	1000000	0.000005000000	0.000025000000
Anthropic: Claude Opus 4.6 (Fast)	1000000	0.000030000000	0.000150000000
Anthropic: Claude Opus 4.7	1000000	0.000005000000	0.000025000000
Anthropic: Claude Opus 4.7 (Fast)	1000000	0.000030000000	0.000150000000
Anthropic: Claude Opus 4.8	1000000	0.000005000000	0.000025000000
Anthropic: Claude Opus 4.8 (Fast)	1000000	0.000010000000	0.000050000000
Anthropic: Claude Opus Latest	1000000	0.000005000000	0.000025000000
Anthropic: Claude Sonnet 4	1000000	0.000003000000	0.000015000000
Anthropic: Claude Sonnet 4.5	1000000	0.000003000000	0.000015000000
Anthropic: Claude Sonnet 4.6	1000000	0.000003000000	0.000015000000
Anthropic Claude Sonnet Latest	1000000	0.000003000000	0.000015000000
Arcee AI: Coder Large	32768	5.00000E-7	8.00000E-7
Arcee AI: Maestro Reasoning	131072	9.00000E-7	0.000003300000
Arcee AI: Spotlight	131072	1.80000E-7	1.80000E-7
Arcee AI: Trinity Large Thinking	262144	2.20000E-7	8.50000E-7
Arcee AI: Trinity Mini	131072	4.5000E-8	1.50000E-7
Arcee AI: Virtuoso Large	131072	7.50000E-7	0.000001200000
Baidu: ERNIE 4.5 21B A3B	131072	7.0000E-8	2.80000E-7
Baidu: ERNIE 4.5 21B A3B Thinking	131072	7.0000E-8	2.80000E-7
Baidu: ERNIE 4.5 300B A47B	131072	2.80000E-7	0.000001100000
Baidu: ERNIE 4.5 VL 28B A3B	131072	1.40000E-7	5.60000E-7
Baidu: ERNIE 4.5 VL 424B A47B	131072	4.20000E-7	0.000001250000
ByteDance Seed: Seed 1.6	262144	2.50000E-7	0.000002000000
ByteDance Seed: Seed 1.6 Flash	262144	7.5000E-8	3.00000E-7
ByteDance Seed: Seed-2.0-Lite	262144	2.50000E-7	0.000002000000
ByteDance Seed: Seed-2.0-Mini	262144	1.00000E-7	4.00000E-7
ByteDance: UI-TARS 7B	128000	1.00000E-7	2.00000E-7
Venice: Uncensored (free)	32768
Cohere: Command A	256000	0.000002500000	0.000010000000
Cohere: Command R (08-2024)	128000	1.50000E-7	6.00000E-7
Cohere: Command R7B (12-2024)	128000	3.7500E-8	1.50000E-7
Cohere: Command R+ (08-2024)	128000	0.000002500000	0.000010000000
Deep Cogito: Cogito v2.1 671B	128000	0.000001250000	0.000001250000
DeepSeek: DeepSeek V3	131072	2.28800E-7	9.14400E-7
DeepSeek: DeepSeek V3 0324	163840	2.00000E-7	7.70000E-7
DeepSeek: DeepSeek V3.1	163840	2.10000E-7	7.90000E-7
DeepSeek: R1	163840	7.00000E-7	0.000002500000
DeepSeek: R1 0528	163840	5.00000E-7	0.000002150000
DeepSeek: R1 Distill Llama 70B	131072	7.00000E-7	8.00000E-7
DeepSeek: R1 Distill Qwen 32B	128000	2.90000E-7	2.90000E-7
DeepSeek: DeepSeek V3.1 Terminus	163840	2.70000E-7	9.50000E-7
DeepSeek: DeepSeek V3.2	131072	2.52000E-7	3.78000E-7
DeepSeek: DeepSeek V3.2 Exp	163840	2.70000E-7	4.10000E-7
DeepSeek: DeepSeek V4 Flash	1048576	9.8300E-8	1.96600E-7
DeepSeek: DeepSeek V4 Flash (free)	1048576
DeepSeek: DeepSeek V4 Pro	1048576	4.35000E-7	8.70000E-7
EssentialAI: Rnj 1 Instruct	32768	1.50000E-7	1.50000E-7
Google: Gemini 2.0 Flash	1048576	1.00000E-7	4.00000E-7
Google: Gemini 2.0 Flash Lite	1048576	7.5000E-8	3.00000E-7
Google: Gemini 2.5 Flash	1048576	3.00000E-7	0.000002500000
Google: Nano Banana (Gemini 2.5 Flash Image)	32768	3.00000E-7	0.000002500000
Google: Gemini 2.5 Flash Lite	1048576	1.00000E-7	4.00000E-7
Google: Gemini 2.5 Flash Lite Preview 09-2025	1048576	1.00000E-7	4.00000E-7
Google: Gemini 2.5 Pro	1048576	0.000001250000	0.000010000000
Google: Gemini 2.5 Pro Preview 06-05	1048576	0.000001250000	0.000010000000
Google: Gemini 2.5 Pro Preview 05-06	1048576	0.000001250000	0.000010000000
Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)	131072	5.00000E-7	0.000003000000
Google: Gemini 3.1 Flash Lite	1048576	2.50000E-7	0.000001500000
Google: Gemini 3.1 Flash Lite Preview	1048576	2.50000E-7	0.000001500000
Google: Gemini 3.1 Pro Preview	1048576	0.000002000000	0.000012000000

Agent tasks

ID	Project	Type	Profile	Status	Error
7	AI Feature Smoke	coding_plan	cheap_fast	done
6	AI Feature Smoke	coding_plan	cheap_fast	failed	Expecting ',' delimiter: line 42 column 164 (char 3657)
5	AI Feature Smoke	coding_plan	cheap_fast	failed	Expecting ',' delimiter: line 43 column 170 (char 3802)
4	AI Feature Smoke	coding_plan	cheap_fast	failed	Expecting ',' delimiter: line 41 column 173 (char 3654)
3	AI Feature Smoke	coding_plan	cheap_fast	failed	Expecting ',' delimiter: line 114 column 6 (char 3806)
2	AI Feature Smoke	plan_experiment	cheap_fast	done
1	AI Feature Smoke	generate_hypotheses	cheap_fast	done

Agent runs

ID	Project	Profile	Model	Tokens	Result
4	AI Feature Smoke	cheap_fast	openai/gpt-4.1-mini-2025-04-14	649 / 1423	{ "title": "Python CLI for Simulation Results Summary", "requirements": [ {"id": "R1", "text": "Read simulation results from a CSV file", "priority": "must"}, {"id": "R2", "text": "Compute statistical summaries (mean, standard deviation, quantiles) for selected metrics", "priority": "must"}, {"id": "R3", "text": "Export the computed summary statistics to a JSON file", "priority": "must"}, {"id": "R4", "text": "Provide a command-line interface to specify input CSV, output JSON, and metrics to summarize", "priority": "must"}, {"id": "R5", "text": "Include unit tests covering CSV reading, statistics computation, and JSON export", "priority": "must"}, {"id": "R6", "text": "Handle invalid input files or missing metrics gracefully with clear error messages", "priority": "should"}, {"id": "R7", "text": "Allow configuration of quantile levels via CLI or config", "priority": "could"} ], "architecture": { "summary": "Modular CLI utility that reads CSV data, computes statistics on selected metrics, and writes a JSON summary output.", "components": [ {"name": "CSVReader", "responsibility": "Load and validate CSV data from file"}, {"name": "StatisticsCalculator", "responsibility": "Compute mean, std, and quantiles for specified metrics"}, {"name": "JSONExporter", "responsibility": "Serialize computed statistics into JSON format and write to file"}, {"name": "CLIHandler", "responsibility": "Parse command line arguments and coordinate workflow"}, {"name": "ErrorHandler", "responsibility": "Manage errors and provide user-friendly messages"} ], "data_flow": [ "CLIHandler parses input arguments", "CSVReader loads CSV data", "StatisticsCalculator computes summaries for selected metrics", "JSONExporter writes summary JSON output", "CLIHandler reports success or errors" ] }, "files": [ {"path": "cli.py", "purpose": "Entry point and CLI argument parsing", "main_symbols": ["main", "parse_args"]}, {"path": "csv_reader.py", "purpose": "Load and validate CSV simulation results", "main_symbols": ["CSVReader"]}, {"path": "statistics.py", "purpose": "Compute statistical summaries for metrics", "main_symbols": ["StatisticsCalculator"]}, {"path": "json_exporter.py", "purpose": "Export computed statistics to JSON file", "main_symbols": ["JSONExporter"]}, {"path": "errors.py", "purpose": "Define custom exceptions and error handling utilities", "main_symbols": ["InvalidInputError"]}, {"path": "tests/test_csv_reader.py", "purpose": "Unit tests for CSV reading and validation", "main_symbols": ["test_csv_loading", "test_invalid_csv"]}, {"path": "tests/test_statistics.py", "purpose": "Unit tests for statistics computations", "main_symbols": ["test_mean_std_quantiles"]}, {"path": "tests/test_json_exporter.py", "purpose": "Unit tests for JSON export functionality", "main_symbols": ["test_json_output"]} ], "implementation_steps": [ {"step": 1, "title": "Setup project structure and dependencies", "details": "Create project folders, initialize Python environment, add requirements (e.g. pandas, numpy)"}, {"step": 2, "title": "Implement CSVReader", "details": "Develop CSVReader class to load CSV, validate presence of required columns, and handle errors"}, {"step": 3, "title": "Implement StatisticsCalculator", "details": "Create functions to compute mean, std deviation, and configurable quantiles for given metrics"}, {"step": 4, "title": "Implement JSONExporter", "details": "Serialize statistics dictionary to JSON and write to output file"}, {"step": 5, "title": "Implement CLIHandler", "details": "Use argparse to parse input CSV path, output JSON path, and metrics list; invoke components in order"}, {"step": 6, "title": "Add error handling and user messages", "details": "Catch and report errors such as file not found, missing metrics, or parse errors"}, {"step": 7, "title": "Write unit tests", "details": "Cover CSV reading, statistics calculations, and JSON export with representative test cases"}, {"step": 8, "title": "Manual testing and validation", "details": "Run CLI with sample CSV files, verify JSON output correctness and error handling"}, {"step": 9, "title": "Documentation and usage instructions", "details": "Document CLI usage, input/output formats, and configuration options"} ], "test_plan": [ {"type": "unit", "target": "CSVReader", "cases": ["valid CSV loads correctly", "missing columns raises error", "empty file handling"]}, {"type": "unit", "target": "StatisticsCalculator", "cases": ["correct mean/std for sample data", "quantiles computed accurately", "handle empty data gracefully"]}, {"type": "unit", "target": "JSONExporter", "cases": ["valid JSON output format", "file write success", "handle write permission errors"]}, {"type": "integration", "target": "CLI end-to-end", "cases": ["valid input produces correct JSON", "invalid input shows error message", "metrics selection works as expected"]}, {"type": "manual", "target": "CLI usability", "cases": ["help message clarity", "error messages user-friendly", "performance on large CSV files"]} ], "risks": [ {"risk": "Input CSV files may have inconsistent formatting or missing data", "mitigation": "Implement robust validation and clear error messages; provide example CSV format"}, {"risk": "Large CSV files may cause high memory usage or slow processing", "mitigation": "Use efficient libraries like pandas; consider streaming if needed"}, {"risk": "User may specify metrics not present in CSV", "mitigation": "Validate metric names early and inform user with suggestions"}, {"risk": "Quantile calculation may be misunderstood or misconfigured", "mitigation": "Document quantile defaults and allow user configuration with validation"} ], "open_questions": [ "What are the exact metric names expected in the CSV and their data types?", "Should quantile levels be fixed or configurable via CLI?", "Is there a preferred JSON schema or structure for the summary output?", "Are there any performance constraints or expected CSV file sizes?" ] }
3	AI Feature Smoke	cheap_fast	openai/gpt-4.1-mini-2025-04-14	484 / 217	```json { "title": "Test impact of bonus frequency on RTP volatility", "goal": "Evaluate if increasing bonus frequency reduces RTP volatility under fixed CPU budget", "parameter_space": { "bonus_frequency": [0.05, 0.10, 0.20] }, "success_criteria": { "primary_metric": "RTP_volatility", "decision_rule": "RTP_volatility decreases as bonus_frequency increases" }, "budget": { "max_jobs": 3, "max_runtime_seconds": 3600 }, "suggested_jobs": [ { "seed": 42, "priority": 70, "parameters": {"bonus_frequency": 0.05}, "timeout_seconds": 1200 }, { "seed": 42, "priority": 60, "parameters": {"bonus_frequency": 0.20}, "timeout_seconds": 1200 } ] } ```
2	AI Feature Smoke	cheap_fast	openai/gpt-4.1-mini-2025-04-14	226 / 88	```json { "hypotheses": [ { "title": "Higher Bonus Frequency Reduces RTP Volatility", "description": "Increasing the frequency of bonus events in the game math model will decrease RTP volatility by providing more frequent smaller wins, which may stabilize player experience under a fixed CPU budget.", "confidence": 0.35, "parent_id": null } ] } ```
1	MVP Smoke Project	cheap_fast	openai/gpt-4.1-mini-2025-04-14	54 / 65	1. Вероятность выигрыша для всех стратегий примерно равна при длительной игре. 2. Средний выигрыш игрока близок к нулю при большом числе раундов. 3. Изменение параметров модели не приводит к систематическому преимуществу одной из сторон.