Phase 1 The Conflict of Information
"The cannabis industry's information in cultivation is conflicting. While collecting data for GrowApp, I noticed that strain information generalized by name alone creates a massive intelligence gap."
In the current market, Acapulco Gold is not a single entity. In our raw collection, it appears 15 times from 8 different seed banks and 6 different breeders. This leads to wildly different Sativa/Indica ratios, THC percentages, and flowering cycles.
One size does not fit all when it comes to strain names. However, when the Breeder is included, the data finally finds its anchor.
Raw Data Sample: The Fragmentation Problem
| strain_name_raw | breeder_display | thc_max_raw | flowering_raw |
|---|---|---|---|
| Acapulco Gold Autoflowering | Seed Supreme | 23% | — |
| Acapulco Gold Feminised | Barney's Farm | — | 60-70 Days |
| Acapulco Gold (F) | Pick and Mix | 0% | 70-80 Days |
Example of conflicting data points for a single strain name across different breeders.
Phase 2 The Concept
The "Source of Truth" was created to provide a receipt for all users. No more "trust me bro" logic. By building a dataset that includes breeders and verified laboratory metadata, we provide a transparent audit trail for every strain ID.
Phase 3 The Production
Using a 100-hour Gemini Flash pipeline, we mapped 200 columns to reconcile these conflicts. From parentage lineage slugs to filial generation tracking, every "whoops" and "trash" record was filtered out to ensure only clean, enterprise-grade data remains.