Scraping WordPress - 4300 court rulings in exchange rate lawsuits without a line of code
It is not often that the execution of a service takes longer than its pricing, but with scraping, this can happen. See how easy it can be to retrieve data, especially from WordPress.

Daniel Gustaw
• 2 min read

It is not often that the execution of a service takes less time than its estimation, but with scraping, this can happen. Scraping is similar to hacking in that, depending on the security measures and the complexity of the system from which we are extracting data, it can be either trivially simple or pose a serious challenge.
In this post, I will show how I performed the scraping service before I had time to estimate it. I did not write a single line of code, and the whole process took me a few minutes.
What the client needed:
The inquiry was about the database of court judgments from the site
https://nawigator.bankowebezprawie.pl/pozwy-indywidualne/
Thanks to the Wappalyzer plugin, we can read that it is WordPress - an ancient technology that is usually friendly to scraping, as its choice indicates a lack of funds for any anti-scraping actions.
The table reloads in real-time. Pagination does not change the URLs. This is a typical solution for the datatable
package which is a jquery
plugin.
On the page of this plugin, we will find the same table, just with slightly modified styles:
These are sufficient clues to suggest that the data for the table is loaded from a single endpoint. A quick analysis of network traffic does not show anything interesting, but showing the page source does:
The rest of the service was just about selecting those few thousand lines of text and saving them in a json
file. Potentially for the convenience of the end user, conversion to csv
or xlsx
, for example on the page
Links to downloaded data:
https://preciselab.fra1.digitaloceanspaces.com/blog/scraping/pc.json
https://preciselab.fra1.digitaloceanspaces.com/blog/scraping/pc.json.xlsx
At the end, I would like to emphasize that although access to this data is free, the people working on its structuring are doing so on a voluntary basis to achieve the goal set by the association:
B) collecting information about unfair practices of entrepreneurs and other cases of legal violations by these entities, and developing and publicly sharing information, articles, reports, and opinions in this regard.
https://rejestr.io/krs/573742/stowarzyszenie-stop-bankowemu-bezprawiu
If you want to benefit from their work, I encourage you to support them on their website
https://www.bankowebezprawie.pl/darowizna/
Other articles
You can find interesting also.

tRPC - super fast development cycle for fullstack typescript apps
We building tRPC client and server with query, mutation, authentication and subscriptions. Authentication for websocket can be tricky and it is in this case so there are presented three approaches to solve this problem.

Daniel Gustaw
• 15 min read

Retry Policy - How to Handle Random, Unpredictable Errors
Learn how to make random, unreproducible errors no longer a threat to your program.

Daniel Gustaw
• 6 min read

Git styled calendar with custom dates
git styled calendar created from list of dates saved as csv file

Daniel Gustaw
• 2 min read