Where Data Goes to Die (And How to Save It)

How to organize your data?

Your pilot project just got cancelled. The promising drug target didn’t pan out. The exploratory analysis is being shelved.

What happens to all that data you spent months generating? 📊

𝗧𝗵𝗲 𝗰𝗼𝗺𝗺𝗼𝗻 𝘀𝗰𝗲𝗻𝗮𝗿𝗶𝗼:

Data lives on someone’s laptop. Project gets discontinued. Person moves to different project. Data disappears into the digital void.

Sound familiar? 😅

𝗪𝗵𝘆 𝘁𝗵𝗶𝘀 𝗺𝗮𝘁𝘁𝗲𝗿𝘀:

That “failed” pilot might contain insights valuable for future work. The cancelled project might have generated negative results that save someone else months of effort.

But only if you can find it. 🔍

𝗗𝗼𝗰𝘂𝗺𝗲𝗻𝘁 𝗶𝘁 𝗳𝗶𝗿𝘀𝘁:

Before you archive anything, write it down. Create an “engineering report”:

→ Background: What were you trying to solve?

→ Research question: What hypothesis were you testing?

→ Methods: How did you generate this data?

→ Why it ended: What changed or didn’t work?

Future you will thank you. 🙏

𝗪𝗵𝗲𝗿𝗲 𝘁𝗵𝗲 𝗱𝗮𝘁𝗮 𝘀𝗵𝗼𝘂𝗹𝗱 𝗴𝗼:

🏆 Best case: Already in a database (organized and queryable)

🤷 More common: Scattered across CSV files, scripts, documents

💡 Pragmatic solution: Organized cold storage

For smaller companies, S3 bucket works well:

→ Cheap long-term storage

→ Flexible (dump everything)

→ Easy to retrieve when needed

Downside: S3 is a digital junk drawer without organization. 🗃️

𝗠𝗮𝗸𝗶𝗻𝗴 𝗶𝘁 𝘄𝗼𝗿𝗸:

→ Consistent naming conventions

→ Clear folder structure

→ README files explaining contents

→ Metadata manifest listing all datasets

𝗧𝗵𝗲 𝗶𝗻𝘀𝗶𝗴𝗵𝘁:

Data archiving isn’t just storage—it’s knowledge preservation. Today’s “failed” experiment might be tomorrow’s breakthrough insight, but only if someone can understand what it was and why it mattered. 💡

𝗙𝗼𝗿 𝗹𝗲𝗮𝗱𝗲𝗿𝘀𝗵𝗶𝗽:

Build data sunset procedures into project workflows. The cost of storage is trivial compared to regenerating lost datasets. 💰

𝗧𝗵𝗲 𝗵𝗮𝗿𝗱 𝘁𝗿𝘂𝘁𝗵:

Most biotech companies are terrible at this. We’re great at generating data, mediocre at organizing it, awful at preserving institutional knowledge when projects end.

It doesn’t have to be this way. ✨

What’s your experience with data from discontinued projects? Have you seen companies do this well?