top of page

Five Steps to Prep Data Like a Pro

  • Writer: Lisa Ciancarelli
    Lisa Ciancarelli
  • 24 hours ago
  • 7 min read
Data prep
Quark Insights - Data Prep Helps Find the Story!

How Your Data Prep Sets the Stage for Insights That Actually Get Used

The clearest path to driving action from your analysis isn't a flashier chart or a more complex model. It starts earlier — before the first formula, before the first pivot table, before you even open Excel or Google Sheets. How you prepare a dataset determines whether your insights feel trustworthy or shaky, whether your analysis takes an hour or a day, and whether future collaborators can build on your work or have to start over from scratch. In my own experience, orienting the data to quickly scan for what stands out saved time and effort. It also helped me to diagnose if I needed contingencies (or supplemental outputs of data) to investigate my ideas.


This checklist draws on the kind of practical experience that doesn't always make it into textbooks — the lessons analysts learn the hard way, by discovering a scrambled column sort after the file is already in a client's inbox. Whether you're early in your career or a seasoned professional looking to sharpen your process, these five steps can change how confidently you walk into any analysis.


The Five-Step Checklist for Data Prep

  1. Understand the dataset before you touch it.

  2. Name columns so anyone can read them.

  3. Protect the original and create a working version.

  4. Run a fast quality-control check.

  5. Leave a short guide for future you — or anyone who picks up the file next.


None of these steps require advanced tools or technical expertise. They require a repeatable routine you can apply in the first 10 minutes with any new file.


Step 1: Understand What You're Working With

Before you write a single formula, you should be able to answer four plain-language questions about the dataset in front of you:

  • What decision or story is this data supposed to support?

  • What does one row represent — one person, one transaction, one week, one ad?

  • What are the main groups of columns — identifiers, dates, costs, outcomes?

  • What obvious problems do you see upfront — missing columns, strange codes, odd formats?


This sounds basic. It is. And it's also the step most analysts skip.

Business questions rarely arrive with clean documentation. You might receive a file from a vendor, a colleague, or a system export you didn't design. If you skip this step, you risk building a polished chart that answers the wrong question — or worse, mixes up units and sends the wrong signal to a decision-maker.


A good habit: write a one-line purpose note at the top of the file or in a small "Read Me" tab. Something like: "This workbook holds weekly streaming ad performance by platform for Q1." That single sentence does more work than you'd expect. It anchors every analysis decision that follows.


Consider what happens when a marketing coordinator opens a campaign performance export and jumps straight to clicks and conversions — only to realize mid-analysis that each row represents a platform and week combination, not a platform and creative combination. Revenue is also missing for Week 3. Catching that upfront means requesting a corrected export before building a report that undervalues the campaign. Catching it after the deck is sent? That's a much harder conversation.


Step 2: Clear Column Naming Conventions

Most raw data exports are built for machines, not humans. Column names like ev_cst, conv, and sub_acq_dt make sense to the system that generated them. They mean nothing to the vice president opening your file at 7 a.m. before a leadership review.


Good column naming follows a simple rule: short, consistent, and descriptive enough that a new reader can guess the meaning without asking you.


A few tactics worth keeping:

  • Replace system codes with real words. sub_acq_dt becomes "Subscriber Join Date." conv becomes "Conversions – New Email Subscribers."

  • Use the same names for the same metric on every tab. "Impressions" on Tab 1 should not become "Imps" on Tab 3.

  • Add a small "Data Dictionary" tab that lists each column name with a one-line definition and any key assumptions.


That last point matters more than people realize. A conversion in one report might mean sign-ups. In another, it might mean purchases. Spell it out. When the person reviewing your work understands the definitions from the start, the meeting stays focused on decisions rather than translation.


A development manager who inherits a donor file with fields called "SRC," "ST," and "Seg" has a real problem. Renaming them to "Acquisition Source," "State," and "Donor Segment" — and adding a short note on how segments are defined — means a consultant reviewing the file six months later can understand the structure in minutes and move straight to strategy. That's not a small thing.


Step 3: Protect Your Original Data

This is the rule most people only learn after something goes wrong.

Always keep at least one untouched copy of the source data. Never edit the raw file. Create a separate working version for cleaning, transforming, and analyzing. Think of them as two different things: one is evidence, the other is your lab.


A simple structure works well:

  • Save the raw export as DatasetName_raw.xlsx and leave it alone.

  • Create a working file like DatasetName_2026-04-06_analysis.xlsx.

  • Inside the working file, keep the raw data on one tab, then add separate "Prep" and "Analysis" tabs for your actual work.

  • Add a short note somewhere prominent: "Source: campaign_export_raw.xlsx, downloaded April 6, 2026."


In client work, in finance, in research — people will ask where a number came from. If you've edited the only version of a file, you may not be able to answer that question.

One painful but instructive scenario: a small e-commerce company keeps a single Sales.xlsx on a shared drive. One day, a team member sorts only the Revenue column — without expanding the selection — and scrambles every row. With no original export saved anywhere, it takes hours to rebuild the sales history from emails and order confirmations. After that week, the team formalizes a raw-plus-working-file pattern for every report they produce. The lesson was expensive. It didn't have to be.


Step 4: Run a Fast Quality-Control (QC) Check

Quality control doesn't mean auditing every cell. It means running a targeted sanity check — enough to raise your confidence in the numbers before you share them.

Executives and clients don't usually have time to rebuild your calculations. But they can spot obvious mistakes. And when they do, your credibility drops — not just on this report, but on future ones too.


When time is tight, run at least three checks:

  • Impossible values. Scan for negative costs, future dates, or response rates above 100%.

  • Totals. Sum key metrics and compare them to the totals your source system reports. If they don't match, find out why.

  • Outliers. Sort by spend, revenue, or conversion rate. Look for any rows that seem far outside the rest of the data.


Document what you find. A short "QC Notes" section — even just a few lines — shows you checked your work and lets you explain what you fixed and why.


Here's a realistic scenario: an analyst builds a channel-level impressions summary minutes before a client call. She sums the impressions in her pivot table and compares the total to the advertising platform's reported figure. They don't match. A quick sort reveals "Display" and "Display Network" appear as separate channels — even though they represent the same inventory. She fixes the categorization, updates the slide, and avoids overstating performance by 20% on the call.


That three-minute check saved a significant problem. It's not glamorous work. It's just good work.


Step 5: Leave a Breadcrumb Trail for the Next Person

Projects span months. People change roles. Files without context become fragile — every handoff requires a long meeting just to explain what's inside. A short "brief" of assumptions, filters, crosstabs on the data helps in continuity for the next person, and demonstrates your value in supporting the future of your business.


A short "cover tab" at the front of the workbook solves this. It doesn't need to be long. It needs to answer four questions for a reader who has never seen the file before:


  • What is this workbook for?

  • What time period and filters apply?

  • What does each tab contain?

  • What are the key assumptions or caveats?


Keep it to under two minutes of reading. Something like: "This file tracks weekly connected TV (CTV) ad performance by platform for Q1 2026. Numbers exclude mobile app traffic. Tab 1: raw export. Tab 2: cleaned data. Tab 3: summary charts."


That's it. A media planner who builds a reach and frequency summary and then adds a cover tab like this creates something reusable. Three months later, when another planner needs to update the file for the next quarter's data, they can do it in an afternoon rather than starting from scratch with a blank workbook.


The cover tab isn't about being thorough. It's about respecting the time of anyone — including yourself — who opens the file again six months from now.


Bringing It Together

Strong analysis is rarely about the most sophisticated tools or the most complex models. More often, it's about the choices made at the very start of a project: how you name things, how you protect source data, how you check your numbers, and how you leave a trail for whoever comes next.


The five steps here are small on their own — but together, they form a habit that compounds over time. Your files become easier to trust. Handoffs become faster. And the people you work with — including future you — spend energy on insight rather than archaeology.


Try It This Week

Before you close this tab, pick one spreadsheet you'll open this week. Spend 10 minutes with the checklist: write a purpose note, clean up a few column names, save a separate working file, run the three QC checks, and add a short cover tab.


Then open it again a week later. If you feel calmer and more confident — if the file feels like it's working for you rather than against you — that's your signal the habit is taking hold.


What's one step from this list you could put into practice today? Share this with someone on your team who's building their first analysis. The earlier these habits start, the easier every project gets.


Have a data prep habit of your own? Drop me a line LisaC@quark-insights.com

bottom of page