Scraping JSON Schema with Screaming Frog Custom Extraction

By Geoff Griffiths @mmatraining1980

If you go to configuration > Custom > Custom Extraction

<script type=\"application\/ld\+json\">(.*?)</script>

Add the code^ in the box/field to the right of “Regex” which you will need to select from the dropdown menu.

I’m using this code to extract product schema only – I can export to excel and filter the URLs containing product schema, but down have the aggregaterating

<script type=\"application\/ld\+json\">(.*?"@type":\s*"Product".*?)<\/script>

Turned out easier to use this regex to identify all the URLs that have aggregateRating fields:

"aggregateRating":\s*\{[^}]+\}


and set up a second custom extraction to check for URLs/pages with any reviews –

"review":\s*\[\s*\{[^]]+\}

If the page had review schema, but not aggregateRating – then we needed to fix them.