Best AI Agent Skills for Data Scientists in 2026

A. Frans

Published April 25, 2026

Data ScienceAI SkillsClaude CodePythonData Analysis

01What Are Agent Skills?
02Quick Comparison
03**1. Data Formulator: Conversational Data Transformation**
04**2. D3.js Visualization: Code-First Interactive Charts**
05**3. ClickHouse Analytics: Analytical Queries at Scale**
06**4. Open Notebook: Document Analysis Without a Hosted Service**
07**5. NotebookLM MCP: Persistent Research Collections**
08**6. Python Weekly: Python Workflow Utilities**
09Recommended Combinations
10Security Checklist Before Installing
11FAQ
12Getting Started: Practical First Steps

Data scientists spend 60–80% of their working hours on data preparation and exploration, not on building models. The right Claude Code skills don't change what data science is, they cut the prep time so actual analysis work gets more hours.

This guide covers the top agent skills for data science work in 2026, including installation commands, real use cases, and an honest read on where each one falls short.

What Are Agent Skills?

Claude Code skills are installable extensions that give Claude domain-specific context, instructions, and tool integrations. They install with a single command and become available in your conversation immediately.

For data scientists, the relevant skills cluster around four areas: data transformation and EDA, visualization, research and document analysis, and database querying at scale.

Quick Comparison

Skill	Primary Use	Setup Complexity	Best For
Data Formulator	Conversational data transformation	Easy	EDA on new datasets
D3.js Visualization	Interactive chart generation	Medium	Web dashboards
ClickHouse Analytics	Large-scale analytical queries	Medium	Event/clickstream data
Open Notebook	Document Q&A and synthesis	Easy	Literature review
NotebookLM MCP	Persistent research collections	Medium	Ongoing research
Python Weekly	Python utilities and code gen	Easy	Workflow automation

1. Data Formulator: Conversational Data Transformation

Data Formulator is a Microsoft Research project that brings a conversational interface to data transformation tasks. Instead of writing pandas code to reshape a dataset, you describe the transformation in plain language and the skill generates and runs the code.

``bash /install data-formulator `


This works best during exploratory data analysis on unfamiliar datasets. When you're still figuring out what questions to ask, the conversational back-and-forth is faster than writing transformation scripts. Describe what you want — "create a column flagging rows where revenue is more than 2 standard deviations below the monthly average" — and Data Formulator writes and executes the pandas code.
The skill supports CSV, JSON, and Excel files locally, plus basic database connections.
Where it excels: Derived field creation, reshaping datasets for visualization, quick EDA on new data sources.
Where it falls short: Production pipelines need human-reviewed code, not auto-generated pandas. Data Formulator is an exploration tool. According to data scientists using it in the Claude Code community, EDA time on unfamiliar datasets drops by roughly half, but the code needs review before going into any automated pipeline.
2. D3.js Visualization: Code-First Interactive Charts
D3.js Visualization integrates the JavaScript charting library directly into Claude Code's workflow. Describe the chart type and data, and the skill generates ready-to-run D3.js code.

`bash /install d3js-visualization `


Data scientists building web-facing dashboards benefit most. Rather than writing D3 boilerplate for each chart, you describe the visual, layout, axes, color encoding, tooltip behavior, and get working code to embed or extend.
The skill handles all standard chart types: bar, line, scatter, histogram, heatmap, area. The more interesting use case is custom visualizations, unusual encodings or hybrid layouts where D3's flexibility matters and charting library defaults won't work.
Where it excels: Interactive web dashboards, custom chart types, stakeholder-facing visualizations that go beyond matplotlib defaults.
Where it falls short: If your primary environment is Python (matplotlib, seaborn, plotly), there's less benefit unless you're specifically building for the browser. The skill generates JavaScript, not Python visualization code.
3. ClickHouse Analytics: Analytical Queries at Scale
ClickHouse is a columnar database designed for fast analytical queries, aggregations over hundreds of millions of rows in seconds, not minutes. The ClickHouse Analytics skill provides optimized query generation and connection management for ClickHouse databases.

`bash /install clickhouse-analytics `


For data scientists working with large event streams, clickstreams, server logs, IoT sensor data, user behavior data at scale. ClickHouse is often the right backend. This skill handles connection setup, schema inspection, and query optimization without requiring deep ClickHouse expertise.
Analytical query speeds 10–100x faster than traditional relational databases are common for aggregation workloads. The skill makes that performance accessible without manual DBA work.
Where it excels: Clickstream analysis, time-series aggregations, user cohort queries, any analytical workload where PostgreSQL is too slow.
Where it falls short: If you're not already using ClickHouse or planning to adopt it, the skill has no value. It's a ClickHouse interface, not a database recommendation tool.
4. Open Notebook: Document Analysis Without a Hosted Service
Open Notebook brings a NotebookLM-style interface to Claude Code without requiring an external service. You load research papers, technical documentation, or reports, and the skill enables Q&A, summarization, and cross-document synthesis within the conversation.

`bash /install open-notebook `


For data scientists doing literature reviews, competitive analysis, or summarizing technical documentation, this replaces significant manual reading time. Load 8 papers on transformer architectures, ask "what are the main approaches to positional encoding," get a synthesized answer with citations from the loaded documents.
The limitation is context window size. Very large collections may need batched loading. For most research papers and technical reports, this isn't a practical constraint.
Where it excels: One-off literature reviews, documentation analysis, synthesizing findings from multiple sources in a single session.
Where it falls short: Cross-session persistence. Every session starts fresh, for ongoing research projects spanning multiple sessions, this adds real friction.
5. NotebookLM MCP: Persistent Research Collections
NotebookLM MCP connects Claude Code to Google's NotebookLM service via the Model Context Protocol. Documents get indexed in Google's infrastructure and persist across sessions, you don't reload them each time.

`bash /install notebooklm-mcp `


The key difference from Open Notebook: once your documents are loaded, they stay indexed. Return to a research project after a week and the collection is exactly where you left it. This makes NotebookLM MCP the right choice for ongoing research that spans multiple sessions or collaborators.
The tradeoff is that documents leave your machine. For proprietary research or sensitive data, review Google NotebookLM's data retention policies before loading.
Where it excels: Long-running research projects, large document collections (50+ papers), shared research between team members.
Where it falls short: Privacy-sensitive research, offline work, one-off queries where setup overhead isn't worth it.
6. Python Weekly: Python Workflow Utilities
Python Weekly focuses on Python tooling and code generation. Not data-science-specific, but data scientists write a lot of Python, and a significant portion of that is boilerplate.

`bash /install python-weekly `


The most-used pattern among data scientists: generating utility functions for operations that don't vary much between projects. Data validation functions, logging setup, config file readers, CLI argument parsers. The skill generates these faster than writing from scratch, with better error handling than most developers include in boilerplate.
Where it excels: Utility function generation, Python project scaffolding, reducing boilerplate in data pipeline code.
Where it falls short: Model training, deep learning frameworks, anything requiring domain knowledge beyond general Python patterns.
Recommended Combinations
The skills aren't mutually exclusive. Most data scientists install several:
For exploratory analysis work: Data Formulator + Open Notebook
For large-scale data engineering: ClickHouse Analytics + D3.js Visualization
For research-heavy roles: Open Notebook (one-time sessions) or NotebookLM MCP (ongoing projects) — the choice depends on whether you need cross-session persistence.
For general Python productivity: Python Weekly is useful enough for nearly any data scientist working in Python.
See our [full list for data scientists](/best-ai-tools-for/data-scientists) for the complete tools directory.
Security Checklist Before Installing
1. Check the GitHub repository, star count, recent commits, open issues 2. Read the SKILL.md file directly to understand what permissions the skill requests 3. For MCP-based skills (ClickHouse Analytics, NotebookLM MCP), review what data gets sent externally 4. Avoid skills with no recent commits and unresolved security-related issues
Never connect a production database to a skill you haven't reviewed.
FAQ
Do these skills work in all Claude Code environments?

Most work in the standard CLI and VS Code/JetBrains extensions. MCP-based skills (NotebookLM MCP, ClickHouse Analytics) require MCP configuration in your settings.json.


What's the difference between a skill and an MCP server?
Skills are instruction sets installed into Claude Code. MCP servers are external processes that provide tool calls (API calls, database connections). Many skills bundle an MCP server component. Data Formulator is a pure skill; ClickHouse Analytics includes an MCP server.
Do these work if I use R instead of Python?
Data Formulator and Python Weekly are Python-specific. Open Notebook and NotebookLM MCP are language-agnostic, they analyze documents, not code. ClickHouse Analytics works regardless of language; it generates SQL.
Are these skills free?
The skills themselves are free and open source. Some connect to external paid services (ClickHouse requires a ClickHouse instance). Claude Code CLI is free.
How do I update an installed skill?

`bash /update [skill-name] `


Check individual skill documentation, some update automatically, others require manual reinstall.
Getting Started: Practical First Steps
If you've never installed a Claude Code skill before, the process takes under 5 minutes:
1. Open Claude Code in your terminal 2. Run the install command for the skill you want 3. Restart Claude Code (required for MCP-based skills) 4. The skill is available immediately in your conversation

For skills that require external service connections (ClickHouse Analytics, NotebookLM MCP), you'll need to add configuration to your ~/.claude/settings.json` file. Each skill's README includes the exact configuration block.

Start here if you're new to data science skills:

Data Formulator is the safest first install, no external connections, immediate value on any CSV or Excel file, reversible if you don't want it. Install it, open a dataset you're already working with, and ask it to create a derived column. That's enough to understand whether conversational data transformation fits your workflow.

If you like it, add Open Notebook next. Same principle: local, no external connections, immediate value on any research you're currently doing.

Add ClickHouse Analytics and NotebookLM MCP after you've confirmed the simpler skills are working as expected in your environment.

Share this article

Share on X LinkedIn Copy Link