Admin | tastellm

On-the-go corpus capture

Screenshot a flow. Save the taste lesson.

Built for a phone in the moment: capture mobile web flows, native app screens, onboarding, billing, settings, or any product surface, then write the feedback straight into the reviewed training corpus.

Screenshots

Use the camera, choose screenshots from Photos, paste from clipboard, or drag images here on desktop.

Feedback / taste lesson Build guidance (optional)

Product

Surface

Quality

Score / 10

URL / reference (optional) Title (optional) Tags

Ready. Add a screenshot and one crisp taste note.

Teach the taste model

Capture scored examples, nuanced rationale, transcripts, and build guidance across any software surface. Applying an example writes it into the corpus as a training example and creates a response eval regression case.

Surface type

Quality band

Score / 10

Product

Pick a product to annotate

Choose from up to 100 top products, or type your own product above.

Training title URL / reference User job-to-be-done Raw input / evidence Images / screenshots

Paste screenshots here, drag images in, or upload files that demonstrate the example.

What works

What fails

Why this score / taste rationale Build guidance for coding agents

What a 10/10 version would do

What a worse version would do

Failure modes / anti-patterns Transcript / agent conversation Tags

Corpus → training & retrieval

Edit any entry to immediately update retrieval/RAG output (re-indexed on save) and the dataset the next fine-tune run trains on. Toggle Train to include/exclude a row from training; unpublish to remove it from live retrieval. Use Audit dataset to inspect the exact chat examples generated from these rows.

Run mode Active model generates a fresh response from the active taste model (what the API returns today) and scores it; Stored example scores each case's saved response. Backend applies to active-model runs: Fine-tune scores the promoted proprietary model; OpenAI + live corpus forces RAG so corpus edits move scores before the next training run.

Eval sets

Grouped benchmarks of response eval cases. Run a whole set to score the active taste model, then reuse the same set to compare fine-tune candidates apples-to-apples. The default taste-holdout-v1 set adopts every case minted from taste training.

New set name

Kind

Add response eval case

Name Prompt/context you responded to Response text to judge (not the score) Eval set Rubric JSON (optional)

Eval run history

Past persisted runs across cases, sets, and suite runs. Use this to compare active-model runs against stored-example runs over time.

Proprietary taste model (serves the API)

Register your proprietary model and point it at an OpenAI-compatible inference endpoint to make it the default that serves /recommend and design proposals. One model for now; register more over time (generations, and low/medium/high tiers) and activate the one that should serve.

Model name

Base model

Generation

Tier

Serving provider

Inference base URL (OpenAI-compatible)

Served model id

Validating today? Use the OpenRouter preset to serve a stock Qwen through the API while your fine-tune trains — it reuses your OPENROUTER_API_KEY. For SageMaker, deploy with finetune/deploy_sagemaker_endpoint.py, then enter the endpoint name here with provider SageMaker.

Model matrix (compare base models)

Create one QLoRA run per base model with shared hyperparameters and the same dataset, so you can compare which open base model best learns your taste. Trains on AWS SageMaker.

Run name prefix

Epochs

S3 bucket (optional)

New fine-tuning run (LoRA / QLoRA on AWS SageMaker)

Builds an instruction dataset from the published taste corpus (principles, annotations, and applied taste-training examples), then trains the base model with QLoRA as a managed SageMaker training job. Cheap, fast validation before scaling up.

Run name

Method

SageMaker instance

Epochs

Learning rate

LoRA rank

S3 bucket (optional)

Base model

Raw LM vs LM + tastellm

Run the same build prompt through a raw provider, then through the same provider with tastellm guidance injected first. Use this to demo how vibe-coded or production-product outputs change when the taste API is in the loop.

Provider

Model tier

Public URL (optional)

Product context (optional) Prompt to compare

Screenshot a flow. Save the taste lesson.

Teach the taste model

Corpus → training & retrieval

Eval sets

Add response eval case

Eval run history

Proprietary taste model (serves the API)

Model matrix (compare base models)

New fine-tuning run (LoRA / QLoRA on AWS SageMaker)

Raw LM vs LM + tastellm

By user

By source

Recent events

Admin sign in

Screenshot a flow. Save the taste lesson.

Teach the taste model

Corpus → training & retrieval

Eval sets

Add response eval case

Eval run history

Proprietary taste model (serves the API)

Model matrix (compare base models)

New fine-tuning run (LoRA / QLoRA on AWS SageMaker)

Raw LM vs LM + tastellm

By user

By source

Recent events

Approve for corpus