Scraping WordPress - 4300 court rulings in exchange rate lawsuits without a line of code
It is not often that the execution of a service takes longer than its pricing, but with scraping, this can happen. See how easy it can be to retrieve data, especially from WordPress.
Daniel Gustaw
• 2 min read
It is not often that the execution of a service takes less time than its estimation, but with scraping, this can happen. Scraping is similar to hacking in that, depending on the security measures and the complexity of the system from which we are extracting data, it can be either trivially simple or pose a serious challenge.
In this post, I will show how I performed the scraping service before I had time to estimate it. I did not write a single line of code, and the whole process took me a few minutes.
What the client needed:
The inquiry was about the database of court judgments from the site
https://nawigator.bankowebezprawie.pl/pozwy-indywidualne/
Thanks to the Wappalyzer plugin, we can read that it is WordPress - an ancient technology that is usually friendly to scraping, as its choice indicates a lack of funds for any anti-scraping actions.
The table reloads in real-time. Pagination does not change the URLs. This is a typical solution for the datatable
package which is a jquery
plugin.
On the page of this plugin, we will find the same table, just with slightly modified styles:
These are sufficient clues to suggest that the data for the table is loaded from a single endpoint. A quick analysis of network traffic does not show anything interesting, but showing the page source does:
The rest of the service was just about selecting those few thousand lines of text and saving them in a json
file. Potentially for the convenience of the end user, conversion to csv
or xlsx
, for example on the page
Links to downloaded data:
https://preciselab.fra1.digitaloceanspaces.com/blog/scraping/pc.json
https://preciselab.fra1.digitaloceanspaces.com/blog/scraping/pc.json.xlsx
At the end, I would like to emphasize that although access to this data is free, the people working on its structuring are doing so on a voluntary basis to achieve the goal set by the association:
B) collecting information about unfair practices of entrepreneurs and other cases of legal violations by these entities, and developing and publicly sharing information, articles, reports, and opinions in this regard.
https://rejestr.io/krs/573742/stowarzyszenie-stop-bankowemu-bezprawiu
If you want to benefit from their work, I encourage you to support them on their website
Other articles
You can find interesting also.
Data Structuring on the Example of CHF NBP Course
Learn how to write code that normalizes and structures data based on a case study in the field of finance.
Daniel Gustaw
• 27 min read
Publishing an update of the package in the AUR repository
Learn how to publish package updates in the Arch Linux user repository.
Daniel Gustaw
• 3 min read
Maximum Inequality [Linear Search] rust and typescript
Simple hackeartch task solved in node js and rust. You con compare these two languages on example of this problem. I recommend to solve it independently before reading solutions.
Daniel Gustaw
• 6 min read