🧹

Clean Data

Clean and standardize raw financial data — fix formatting, remove duplicates, normalize units, and prepare for analysis.

by @anthropics · Apache 2.0 New

What this skill does

Transform messy spreadsheets into clean, analysis-ready data by automatically fixing formatting errors, standardizing dates, and removing duplicates. You save hours of manual cleanup while ensuring every column is consistent and reliable. Use this whenever you export raw data that looks inconsistent or needs preparation before building reports.

Anthropic · Financial Analysis
view on github ↗

name: clean-data-xls description: Clean up messy spreadsheet data — trim whitespace, fix inconsistent casing, convert numbers-stored-as-text, standardize dates, remove duplicates, and flag mixed-type columns. Use when data is messy, inconsistent, or needs prep before analysis. Triggers on “clean this data”, “clean up this sheet”, “normalize this data”, “fix formatting”, “dedupe”, “standardize this column”, “this data is messy”.

Clean Data

Clean messy data in the active sheet or a specified range.

Environment

  • If running inside Excel (Office Add-in / Office JS): Use Office JS directly (Excel.run(async (context) => {...})). Read via range.values, write helper-column formulas via range.formulas = [["=TRIM(A2)"]]. The in-place vs helper-column decision still applies.
  • If operating on a standalone .xlsx file: Use Python/openpyxl.

Workflow

Step 1: Scope

  • If a range is given (e.g. A1:F200), use it
  • Otherwise use the full used range of the active sheet
  • Profile each column: detect its dominant type (text / number / date) and identify outliers

Step 2: Detect issues

IssueWhat to look for
Whitespaceleading/trailing spaces, double spaces
Casinginconsistent casing in categorical columns (usa / USA / Usa)
Number-as-textnumeric values stored as text; stray $, ,, % in number cells
Datesmixed formats in the same column (3/8/26, 2026-03-08, March 8 2026)
Duplicatesexact-duplicate rows and near-duplicates (case/whitespace differences)
Blanksempty cells in otherwise-populated columns
Mixed typesa column that’s 98% numbers but has 3 text entries
Encodingmojibake (é, ’), non-printing characters
Errors#REF!, #N/A, #VALUE!, #DIV/0!

Step 3: Propose fixes

Show a summary table before changing anything:

ColumnIssueCountProposed Fix

Step 4: Apply

  • Prefer formulas over hardcoded cleaned values — where the cleaned output can be expressed as a formula (e.g. =TRIM(A2), =VALUE(SUBSTITUTE(B2,"$","")), =UPPER(C2), =DATEVALUE(D2)), write the formula in an adjacent helper column rather than computing the result in Python and overwriting the original. This keeps the transformation transparent and auditable.
  • Only overwrite in place with computed values when the user explicitly asks for it, or when no sensible formula equivalent exists (e.g. encoding/mojibake repair)
  • For destructive operations (removing duplicates, filling blanks, overwriting originals), confirm with the user first
  • After each category of fix (whitespace → casing → number conversion → dates → dedup), show the user a sample of what changed and get confirmation before moving to the next category
  • Report a before/after summary of what changed

Install this Skill

Skills give your AI agent a consistent, structured approach to this task — better output than a one-off prompt.

npx skills add anthropics/financial-services-plugins --skill financial-analysis
Download ZIP

Official Anthropic skill. Need a walkthrough? See the install guide →

Works with

No terminal needed — Claude.ai works by pasting the skill into custom instructions.

Details

License
Apache 2.0
Source file
show path financial-analysis/skills/clean-data-xls/SKILL.md
finance financial-analysis data-cleaning financial-services-plugins