Saturn Parser - Extract contents from a url
Saturn Parser is a composer package that extracts the bits that humans care about from any URL you give it.
After a few days, I finished the prototype of the library. It was able to fetch metadata from a URL. For scraping the website I used a PHP DomDocument class. I wanted to do the scraping part with goutte because it has more functionality but I wanted to try it out first without a library. Maybe, Later on, I will include goutte and improve the scraping capabilities at some point. Php DomDocument also had some downside to it. Using a well-maintained scraper will be easier. I will integrate goutte web scraper to a future update. The code is still bit rusty and a lot of improvements need to add to the Extractors.
composer require jinas/saturn
- Code sample:
- The result looks like this:
If you want to suggest anything or to ask a question leave a comment down below. I will keep improving it. I am currently working on making a web application using this package. I will upload soon :)