Handling data trees in any application can be a challenge. In working with them as an exercise a sample data tree would be convenient. It turns out you can find a lot of data on trees but not on data trees. A surprising amount of municipalities provide open-data on the trees they own, with data on diameter, age and location (example here). In json. These collections however lack depth so are not very useful. So why not create a fresh data tree. The topic of the data tree is not relevant.

Wikipedia has an API for data retrieval and can be used to mine for categories. Each Wikipedia article has at least one category and each category can have one or more subcategories. A tree is borne. The API is simple to use, this url

https://en.wikipedia.org/w/api.php?action=categorytree&category=car

returns a json format with all subcategories for the car category. Unexpectedly, the json result looks like this:

{"categorytree":{"*":"DOM-tree}}

With “DOM-tree” an actual HTML DOM tree. The star is useless as it is not a recognised PHP property name and extraction of the categories requires conversion to a DOM document and Xpath. Having done that, collecting all categories to a certain depth requires iteration. This process appears simple but is not. The blog post at kvz.io here explains it all but basically you have to create a flat tree first and then explode to a real tree.

The complete data tree demo can be found at github. The new sample tree is here.

Advertisements