Data Pipeline Gap Analysis Last batch loaded: 2026-04-05 17:36 UTC
Total raw records
10,295
distinct records across all raw tables
What is counted

Unique records in the raw_data staging tables, before dbt. Each entity type uses a different deduplication key.

  • Posts2,031 distinct records
  • Users1,863 distinct records
  • Comments89 distinct records
  • Keywords5,896 distinct records
  • WhatsApp416 distinct records
Posts → COUNT(DISTINCT platform + COALESCE(content_id, link))
Users → COUNT(DISTINCT platform + user_id)
Comments → COUNT(DISTINCT comment_id)
Keywords / WhatsApp → COUNT(*) WHERE link ILIKE '%tiktok%'
Total loaded (final)
4,191
rows in social_data_alfa after dbt
What is counted

Row counts from the dbt-managed final tables in social_data_alfa. These are the records that passed all dbt transformations.

  • Posts2,031 rows in final table
  • Users1,863 rows in final table
  • Comments80 rows in final table
  • Keywords20 rows in final table
  • WhatsApp197 rows in final table
Posts → social_data_alfa.posts
Users → social_data_alfa.usernames
Comments → social_data_alfa.comments
Keywords → source_log WHERE source_type = 'keyword_script'
WhatsApp → source_log WHERE source_type = 'whatsapp_script'
Gap rows detected
6,301
raw records not found in final tables
What is counted

For each entity: raw LEFT JOIN final on business key, counting rows where the final side is NULL. Results are capped at 500 per entity — the true gap count may be higher.

  • Posts0 gap rows (capped at 500)
  • Users0 gap rows (capped at 500)
  • Comments9 gap rows (capped at 500)
  • Keywords5,876 gap rows (capped at 500)
  • WhatsApp416 gap rows (capped at 500)
SELECT raw.* FROM raw
LEFT JOIN final ON key
WHERE final.id IS NULL LIMIT 500
Avg load rate
67.5%
mean of per-entity load rates
How this is calculated

Each entity: loaded ÷ raw × 100. Then the mean of all 5 entity rates. Note: every entity is weighted equally, so Keywords at 0.5% pulls the average down as much as Posts at 100%.

  • Postsloaded (2,031) ÷ raw (2,031) × 100 = 100.0%
  • Usersloaded (1,863) ÷ raw (1,863) × 100 = 100.0%
  • Commentsloaded (80) ÷ raw (89) × 100 = 89.9%
  • Keywordsloaded (20) ÷ raw (5,896) × 100 = 0.3%
  • WhatsApploaded (197) ÷ raw (416) × 100 = 47.4%
(100.0% + 100.0% + 89.9% + 0.3% + 47.4%) ÷ 5 = 67.5%
Avg. per-entity load rate
67.5%
Entity breakdown
Posts tiktok_posts_meta → posts
Raw 2,031
Loaded 2,031
Gaps detected 0
100.0% loaded
Clean

No gaps detected.

Users tiktok_users_meta → usernames
Raw 1,863
Loaded 1,863
Gaps detected 0
100.0% loaded
Clean

No gaps detected.

Comments tiktok_comments_meta → comments
Raw 89
Loaded 80
Gaps detected 9
89.9% loaded
Gaps
🔴 Fatal gaps — record was NOT inserted
Gap reasonCount (sample)Samples
parent post not scraped 9
7564920220838707976 7565020009086796551 7565095443253216008 7565140281394397953 7565105121052574472 +4 more
Keywords tiktok_script_out → source_log
Raw 5,896
Loaded 20
Gaps detected 5,876
0.3% loaded
Gaps
🔴 Fatal gaps — record was NOT inserted
Gap reasonCount (sample)Samples
post not scraped 1
https://support.tiktok.com/en
🟡 Soft gaps — record inserted but incomplete
Gap reasonCount (sample)Samples
short URL — no video ID in link 499
https://vt.tiktok.com/ZShjCyS7p/ $,22 Ulta 30 QR https://vt.tiktok.com/ZShRKV2jr/ (*)https://vt.tiktok.com/ZShhYNr5P/ *(https://vt.tiktok.com/ZShhYYpkj/ https://vm.tiktok.com/ZGdaFX9sS/ +494 more
WhatsApp whatsapp_script_out → source_log
Raw 416
Loaded 197
Gaps detected 416
47.4% loaded
Gaps
🔴 Fatal gaps — record was NOT inserted
Gap reasonCount (sample)Samples
post not scraped 25
https://www.tiktok.com/@.hs.n.douaa?_r=1&_t=ZS-910slQDPi3W https://www.tiktok.com/@.majed.alshbh5?_r=1&_t=ZS-91U3r8J4pAA https://www.tiktok.com/@_____abn_sw______?_t=ZP-90bBLvanA2Y&_r=1 https://www.tiktok.com/@al7ariri2?_t=ZS-90scJ8io9Si&_r=1 https://www.tiktok.com/@azrael_792/video/7592955395056880914?_r=1&u_code=e1dj87l7efe3j1&preview_pb=0&sharer_language=en&_d=ef9j05j49b9a6j&share_item_id=7592955395056880914&source=h5_m&timestamp=1769700969&item_author_type=2&utm_source=copy&tt_from=copy&enable_checksum=1&utm_medium=ios&share_link_id=8A05715D-CD2A-45C3-BA72-94D8971F3BDF&user_id=7091612352138757126&sec_user_id=MS4wLjABAAAATjb70hydNFr71Sd_yT4b2OltTibX2q5lR8YeucOkVt3a69WUs1hBGqszVXSyxBXu&social_share_type=0&ug_btm=b2001&utm_campaign=client_share&link_reflow_popup_iteration_sharer=%7B%22follow_to_play_duration%22:-1,%22click_empty_to_play%22:1,%22dynamic_cover%22:1,%22profile_clickable%22:1%7D&share_app_id=1233 +20 more
🟡 Soft gaps — record inserted but incomplete
Gap reasonCount (sample)Samples
short URL — no video ID in link 391
https://vm.tiktok.com/ZNd7YaKeC/ https://vm.tiktok.com/ZS91AWAKmrBga-TyVa8/ https://vm.tiktok.com/ZS9e83RU16emy-cHj6M/ https://vm.tiktok.com/ZS9e8xgkNEyE6-49C4s/ https://vm.tiktok.com/ZSH346Nbts7rS-BaKJq/ +386 more