1. The "Black Box" Problem in Media
We live in an era of information opacity. Modern media has a sourcing problem. A headline claims "Study Shows AI is Biased," but finding the actual PDF of the study is impossible. A review claims "Users Hate This Tool," but links to no evidence or methodology. When you read a software review on many major tech publications, you have no idea if the writer actually used the software, or if they just rewrote the marketing copy from the vendor's homepage.
WhichAIPick operates differently. We believe that Data Sovereignty extends to the reader. You have a right to know exactly where our conclusions come from. You have a right to audit our inputs.
This policy details the precise inputs that feed our algorithm. We classify our data into three categories: Primary (Vendor), Secondary (Synthetic), and Tertiary (User). By understanding this "Data Supply Chain," you can better evaluate the trustworthiness of our recommendations in our Software Directory.
2. Primary Data: Verified Vendor Intelligence
This is the "Hard Spec" data. In the world of physical hardware reviews, this would be the dimensions of a phone or the horsepower of a car. In AI software, it is the context window size, the parameter count, and the API rate limits.
We treat vendor-supplied data with extreme skepticism until it is verified. Marketing departments are incentivized to exaggerate. Engineering departments are incentivized to be accurate. Our job is to find the engineering truth hidden behind the marketing fluff.
Sources of Primary Data
- API Documentation: This is our "Source of Truth." Marketing pages may claim "Unlimited," but the API docs will always reveal the rate limit (e.g., "60 requests per minute"). We scrape API documentation to build our feature matrix.
- Whitepapers & Technical Reports: For foundation models (like GPT-4 or Claude 3), we analyze the ArXiv papers released by the research labs. We look for specific benchmarks (MMLU, HumanEval) that are peer-reviewed.
- Terms of Service & Privacy Policies: We employ a "Legal Audit" script that scans these documents for keywords like "Training Data," "Data Retention," and "Third Party Sharing." This is how we determine our Privacy Scores.
- Security Audits (SOC2 / ISO 27001): We require vendors to provide proof of compliance. A badge on a footer is not enough; we ask for the date of the last audit.
Verification Protocols
We cross-reference marketing claims against technical docs. If marketing says "Unlimited Context" but the API docs say "128k context limit," we publish the 128k number and penalize the Trust score. If a vendor claims to be "GDPR Compliant" but their servers are hosted exclusively in Virginia, USA, without Standard Contractual Clauses (SCCs), we flag this as a Compliance Risk.
3. Secondary Data: Synthetic Benchmarking
This is the most valuable data we own. It is proprietary data generated by our own testing labs. It does not exist anywhere else on the internet. Because AI models are non-deterministic (they give different answers every time), relying on a single test is statistically invalid. We rely on **Synthetic Benchmarking**—running thousands of automated tests to generate a statistical average of performance.
The Test Harness
We have built a custom testing harness called "The Gauntlet." This software pipeline connects to the APIs of the tools we review and fires a randomized battery of prompts. You can read more about this in our Review Methodology.
- Latency Logs: We record the millisecond response time (Time to First Token) of models. We test this at different times of day to account for server load. A tool might be fast at 2 AM but unusable at 2 PM. Our score reflects the average.
- Fidelity Scores: Our internal scoring of image/text quality based on our "Prompt Battery". We use reference-based scoring metrics (like BLEU and ROUGE for text, and CLIP scores for images) to mathematically grade the output.
- Regression Data: We track how these scores change over time. Users often complain that "GPT-4 feels dumber today." We don't rely on feelings. We look at the regression line of our benchmark scores over the last 90 days.
Synthetic Data Generation
To test these tools without compromising user privacy, we generate **Synthetic User Data**. We do not paste real confidential emails into an AI writer to test it. Instead, we use a library of "Fake PII" (Personally Identifiable Information)—fake names, fake addresses, fake credit card numbers—to test how the AI handles sensitive data. This allows us to test "Data Leakage" safety rails without risking real harm.
4. Tertiary Data: User Telemetry & Aggregation
We aggregate anonymized user behavior to understand "Real World" usage patterns. This helps us weight our categories. If 80% of our users are searching for "Free Plan," we increase the weight of the "Free Tier Generosity" score in our algorithm.
The "Wisdom of the Crowd" vs. "The Mob"
User data is noisy. A user might give a tool a 1-star review because they forgot their password. To filter this signal, we use **Cohort Analysis**.
- Verified Users: We prioritize data from users who have authenticated via LinkedIn or corporate email.
- Usage Depth: We weight feedback higher from users who have revisited the tool page multiple times, indicating sustained interest.
- Exit Traffic: We track which outbound links are clicked. If a tool has high traffic but low click-through to the vendor, it implies the review convinced the user not to buy. This is a negative signal we incorporate into our "Satisfaction" metric.
Privacy Commitment
We never use individual user data to influence a specific review. We look at aggregate cohorts only. We do not track individual browsing history across the web. For comprehensive details, see our Editorial Ethics.
5. The Data Lifecycle: From Collection to Deletion
We believe in **Data Minimization**. We only collect the data we need, and we delete it when we don't. Here is the lifecycle of a data point at WhichAIPick:
Phase 1: Ingestion
Data enters our system via our automated scrapers (for pricing), our API harness (for benchmarks), or our user feedback forms. All incoming data is timestamped and source-tagged.
Phase 2: Normalization
Raw data is messy. One vendor lists pricing as "$20/mo", another as "$240/yr", another as "0.003 cents/token". Our normalization engine converts all these values into a standard **Monthly Cost of Ownership** metric to allow for apples-to-apples comparison. See Pricing Policy.
Phase 3: Processing & Scoring
The normalized data is fed into our Algorithm (The "WhichAIPick Score"). This algorithm runs nightly. This means a change in a vendor's pricing today will be reflected in their score tomorrow morning.
Phase 4: Archival & Deletion
We retain historical pricing data for 24 months to power our "Price History" charts. After 24 months, granular data is aggregated into monthly averages and the raw data is purged. User telemetry data (clicks, dwell time) is anonymized after 90 days and deleted after 12 months.
6. Third-Party Subprocessors
We are a modern software company, which means we rely on cloud infrastructure. We are transparent about who handles our data. We only work with sub-processors who meet our strict security standards.
| Subprocessor | Purpose | Location |
|---|---|---|
| Amazon Web Services (AWS) | Hosting & Compute Infrastructure | USA (Virginia) |
| Cloudflare | CDN & DDoS Protection | Global (Edge) |
| PostgreSQL (Managed) | Database Storage | USA |
| Google Analytics 4 | Anonymized Web Traffic Analytics | Global |
We do NOT share data with data brokers, ad networks, or hedge funds.
7. Algorithmic Fairness & Bias Testing
Algorithms are opinions embedded in code. Our ranking algorithm is an opinion: "We prefer safe, cheap, fast tools." We are transparent about this bias. However, we strive to eliminate **Unintended Bias**.
The "Small Vendor" Bias
Review sites often favor big incumbents (Adobe, Microsoft) because they have higher search volume. To counter this, our algorithm includes a "Discovery Boost" for new tools that score highly on technical merit but have low brand awareness. This ensures that a brilliant tool built by a 2-person team in Estonia has a chance to outrank a mediocre tool built by Google. Discover these gems in our New Arrivals.
The "English Language" Bias
Currently, our natural language processing tests are predominantly in English. We acknowledge this limitation. We are actively working on a multi-lingual benchmark suite to fairly evaluate tools that specialize in Spanish, French, and Mandarin generation.
8. Data Limitations and Known Unknowns
We pride ourselves on accuracy, but we are engineers, not magicians. Our data has limitations that you should be aware of:
- Endpoint Variance: We test from US and EU servers. Performance in Asia, Africa, or South America may vary significantly due to CDN propagation and latency. A tool that is "Fast" for us might be "Slow" for you in Mumbai.
- Enterprise Customization: We test the off-the-shelf product. Enterprise implementations often have custom fine-tuning, dedicated hardware, and SLA guarantees that improve performance significantly. We cannot benchmark private, air-gapped instances.
- A/B Testing: Vendors constantly A/B test their models. We might be testing "Model Version A" while you are served "Model Version B." Where possible, we try to force specific model versions via API, but this is not always possible with web interfaces.
- Hidden "Fair Use" Policies: Some "Unlimited" plans have hidden caps (e.g., throttling after 2000 generations) that are not published in docs and only discovered through extreme usage. While we try to hit these limits, we cannot always trigger them in a standard test cycle.
9. How to Request Your Data
If you have created an account with WhichAIPick (for saving tools or creating alerts), you have the right to export or delete your data.
To Export: Email privacy@whichaipick.com with the subject "Data Export Request." We will provide a JSON file of your data within 14 days.
To Delete: Email privacy@whichaipick.com with the subject "Account Deletion." We will purge your data from our active database immediately and from our backups within 30 days.
10. Glossary of Data Terms
Understanding data privacy requires understanding the jargon.
- PII (Personally Identifiable Information)
- Any data that can identify you: Name, Email, IP Address, Phone Number. We strip this.
- Pseudonymization
- Replacing your name with a random ID (e.g., "User 8492"). The data is still there, but it's not linked to "John Doe."
- Differential Privacy
- A mathematical technique that adds "noise" to a dataset so that statistical trends can be analyzed without revealing any single individual's data.
- Data Lake
- A centralized repository where we store structured and unstructured data at scale. Ours is encrypted.
- Clean Room
- A secure environment where data is analyzed without leaving the secure server. No data ever "leaves" the clean room.
Related Resources
v1.4 (2026-02-19): Final Lock. Added "Limitations of Data" and Compliance Contact.
v1.3 (2026-02-19): Added Glossary of Data Terms.
v1.2 (2026-02-19): Hyper-expanded "Data Lifecycle" and "Subprocessors" sections. Added "Algorithmic Fairness" and "Synthetic Benchmarking" details.
v1.1 (2026-02-19): Added "Primary vs Secondary" data classification spectrum.
v1.0 (2024-02-01): Initial data transparency statement.