Scraping WordPress - 4300 court rulings in exchange rate lawsuits without a line of code
It is not often that the execution of a service takes longer than its pricing, but with scraping, this can happen. See how easy it can be to retrieve data, especially from WordPress.

Daniel Gustaw
• 2 min read

It is not often that the execution of a service takes less time than its estimation, but with scraping, this can happen. Scraping is similar to hacking in that, depending on the security measures and the complexity of the system from which we are extracting data, it can be either trivially simple or pose a serious challenge.
In this post, I will show how I performed the scraping service before I had time to estimate it. I did not write a single line of code, and the whole process took me a few minutes.
What the client needed:
The inquiry was about the database of court judgments from the site
https://nawigator.bankowebezprawie.pl/pozwy-indywidualne/
Thanks to the Wappalyzer plugin, we can read that it is WordPress - an ancient technology that is usually friendly to scraping, as its choice indicates a lack of funds for any anti-scraping actions.
The table reloads in real-time. Pagination does not change the URLs. This is a typical solution for the datatable
package which is a jquery
plugin.
On the page of this plugin, we will find the same table, just with slightly modified styles:
These are sufficient clues to suggest that the data for the table is loaded from a single endpoint. A quick analysis of network traffic does not show anything interesting, but showing the page source does:
The rest of the service was just about selecting those few thousand lines of text and saving them in a json
file. Potentially for the convenience of the end user, conversion to csv
or xlsx
, for example on the page
Links to downloaded data:
https://preciselab.fra1.digitaloceanspaces.com/blog/scraping/pc.json
https://preciselab.fra1.digitaloceanspaces.com/blog/scraping/pc.json.xlsx
At the end, I would like to emphasize that although access to this data is free, the people working on its structuring are doing so on a voluntary basis to achieve the goal set by the association:
B) collecting information about unfair practices of entrepreneurs and other cases of legal violations by these entities, and developing and publicly sharing information, articles, reports, and opinions in this regard.
https://rejestr.io/krs/573742/stowarzyszenie-stop-bankowemu-bezprawiu
If you want to benefit from their work, I encourage you to support them on their website
https://www.bankowebezprawie.pl/darowizna/
Other articles
You can find interesting also.

Fastify Prisma REST backend
Typescript template for Fastify REST API with Prisma and JWT authentication.

Daniel Gustaw
• 7 min read

Broadcast Channel API
This post shows how to use the Broadcast Channel API to send data between browser tabs or windows without using a server and sockets.

Daniel Gustaw
• 11 min read

How to Install Yay on a Pure Arch Linux Docker Image
Yay installation require few steps like user creation, base-devel and git installation, change in /etc/sudousers, cloning yay repo and makepkg on it. This posts covering this process steps by steps.

Daniel Gustaw
• 3 min read