Scoring

Heuristic (keyword-based) scoring engine for clinical summaries.

This module provides a fast, deterministic baseline scorer that requires no API key. Each of the 8 rubric dimensions has its own scoring function that inspects the summary text for keyword hits, structural markers, sentence statistics, and word count.

Scoring flow for one role

Run all 8 dimension scorers independently.
Apply role-specific adjustments (_apply_role_adjustments).
Clamp all scores to [1, 5].
Compute a weighted overall using the role's w_prior weights.

Score scale: 1 (worst) to 5 (best) per dimension.

`AgentScore` `dataclass`

Container for a single role's scoring output.

Attributes:

Name	Type	Description
`role_id`	`str`	Which clinical role produced this score.
`scores`	`Dict[str, int]`	Mapping of dimension_id → integer score (1-5).
`rationales`	`Dict[str, str] \| None`	Optional per-dimension textual justification.
`evidence`	`Dict[str, List[str]] \| None`	Optional per-dimension list of matched keywords/evidence.
`overall_notes`	`str`	Free-text note about the role's perspective.
`warnings`	`List[str] \| None`	Any warnings generated during scoring (e.g. empty input).
`overall_score`	`float \| None`	Weighted average across dimensions (computed post-scoring).

Source code in src/grading_pipeline/scoring.py

@dataclass(frozen=True)
class AgentScore:
    """Container for a single role's scoring output.

    Attributes:
        role_id: Which clinical role produced this score.
        scores: Mapping of dimension_id → integer score (1-5).
        rationales: Optional per-dimension textual justification.
        evidence: Optional per-dimension list of matched keywords/evidence.
        overall_notes: Free-text note about the role's perspective.
        warnings: Any warnings generated during scoring (e.g. empty input).
        overall_score: Weighted average across dimensions (computed post-scoring).
    """
    role_id: str
    scores: Dict[str, int]
    rationales: Dict[str, str] | None = None
    evidence: Dict[str, List[str]] | None = None
    overall_notes: str = ""
    warnings: List[str] | None = None
    overall_score: float | None = None

    def to_dict(self) -> Dict:
        payload = {
            "role_id": self.role_id,
            "scores": self.scores,
            "score": self.scores,
        }
        if self.rationales is not None:
            payload["rationales"] = self.rationales
        if self.evidence is not None:
            payload["evidence"] = self.evidence
        if self.overall_notes:
            payload["overall_notes"] = self.overall_notes
        if self.overall_score is not None:
            payload["overall_score"] = self.overall_score
        if self.warnings:
            payload["warnings"] = self.warnings
        return payload

`compute_overall_score(scores, weights, dimension_ids)`

Compute a weighted average score across dimensions.

If total weight is zero (or all weights missing), falls back to a simple unweighted mean. Result is rounded to 2 decimal places.

Source code in src/grading_pipeline/scoring.py

def compute_overall_score(
    scores: Dict[str, int], weights: Dict[str, float], dimension_ids: List[str]
) -> float:
    """Compute a weighted average score across dimensions.

    If total weight is zero (or all weights missing), falls back to a
    simple unweighted mean.  Result is rounded to 2 decimal places.
    """
    total_weight = sum(weights.get(dim, 0.0) for dim in dimension_ids)
    if total_weight <= 0:
        total_weight = float(len(dimension_ids) or 1)
        return round(sum(scores[dim] for dim in dimension_ids) / total_weight, 2)
    weighted_sum = sum(scores[dim] * weights.get(dim, 0.0) for dim in dimension_ids)
    return round(weighted_sum / total_weight, 2)

`score_summary_heuristic(summary, role, rubric)`

Score a clinical summary using the keyword-based heuristic engine.

Runs all 8 dimension scorers, applies role-specific adjustments, clamps to [1, 5], and computes the weighted overall score.

Parameters:

Name	Type	Description	Default
`summary`	`str`	The clinical summary text to evaluate.	required
`role`	`RoleProfile`	The clinical role whose perspective to apply.	required
`rubric`	`Rubric`	The evaluation rubric (defines which dimensions to score).	required

Returns:

Type	Description
`AgentScore`	An `AgentScore` with per-dimension scores, rationales, evidence,
`AgentScore`	and a weighted overall score.

Source code in src/grading_pipeline/scoring.py

def score_summary_heuristic(summary: str, role: RoleProfile, rubric: Rubric) -> AgentScore:
    """Score a clinical summary using the keyword-based heuristic engine.

    Runs all 8 dimension scorers, applies role-specific adjustments,
    clamps to [1, 5], and computes the weighted overall score.

    Args:
        summary: The clinical summary text to evaluate.
        role: The clinical role whose perspective to apply.
        rubric: The evaluation rubric (defines which dimensions to score).

    Returns:
        An ``AgentScore`` with per-dimension scores, rationales, evidence,
        and a weighted overall score.
    """
    summary = summary.strip()
    warnings: List[str] = []
    if not summary:
        warnings.append("Empty summary provided.")

    scores: Dict[str, int] = {}
    rationales: Dict[str, str] = {}
    evidence: Dict[str, List[str]] = {}

    factual_score, factual_rationale, factual_evidence = _score_factual_accuracy(summary)
    scores["factual_accuracy"] = factual_score
    rationales["factual_accuracy"] = factual_rationale
    evidence["factual_accuracy"] = factual_evidence

    chronic_score, chronic_rationale, chronic_evidence = _score_chronic_coverage(summary)
    scores["relevant_chronic_problem_coverage"] = chronic_score
    rationales["relevant_chronic_problem_coverage"] = chronic_rationale
    evidence["relevant_chronic_problem_coverage"] = chronic_evidence

    org_score, org_rationale, org_evidence = _score_organized(summary)
    scores["organized_by_condition"] = org_score
    rationales["organized_by_condition"] = org_rationale
    evidence["organized_by_condition"] = org_evidence

    timeline_score, timeline_rationale, timeline_evidence = _score_timeline(summary)
    scores["timeline_evolution"] = timeline_score
    rationales["timeline_evolution"] = timeline_rationale
    evidence["timeline_evolution"] = timeline_evidence

    recent_score, recent_rationale, recent_evidence = _score_recent_changes(summary)
    scores["recent_changes_highlighted"] = recent_score
    rationales["recent_changes_highlighted"] = recent_rationale
    evidence["recent_changes_highlighted"] = recent_evidence

    word_count = _word_count(summary)
    focus_score, focus_rationale = _score_focus_by_length(word_count)
    scores["focused_not_cluttered"] = focus_score
    rationales["focused_not_cluttered"] = focus_rationale
    evidence["focused_not_cluttered"] = [f"word_count={word_count}"]

    decision_score, decision_rationale, decision_evidence = _score_decision_usefulness(summary)
    scores["usefulness_for_decision_making"] = decision_score
    rationales["usefulness_for_decision_making"] = decision_rationale
    evidence["usefulness_for_decision_making"] = decision_evidence

    clarity_score, clarity_rationale, clarity_evidence = _score_clarity(summary)
    scores["clarity_readability_formatting"] = clarity_score
    rationales["clarity_readability_formatting"] = clarity_rationale
    evidence["clarity_readability_formatting"] = clarity_evidence

    _apply_role_adjustments(role.id, scores, rationales)

    overall_notes = f"Role perspective: {role.name}. Summary length {word_count} words."

    for dim_id in rubric.dimension_ids:
        scores[dim_id] = max(1, min(5, int(scores[dim_id])))

    overall_score = compute_overall_score(scores, role.w_prior, rubric.dimension_ids)

    return AgentScore(
        role_id=role.id,
        scores=scores,
        rationales=rationales,
        evidence=evidence,
        overall_score=overall_score,
        overall_notes=overall_notes,
        warnings=warnings or None,
    )

Scoring

AgentScore dataclass

compute_overall_score(scores, weights, dimension_ids)

score_summary_heuristic(summary, role, rubric)

`AgentScore` `dataclass`

`compute_overall_score(scores, weights, dimension_ids)`

`score_summary_heuristic(summary, role, rubric)`