TrickJarrett.com

And so the weekend begins

6/23/2023 5:55 pm |

After a good productive workday the wife and I headed to our local plant business and bought some new plants for the yard, both flowering and fruiting. From there we came home and did some gardening before turning to some other chores.

I gave the car a light cleanout and moved some stuff around, and am now taking a breather before starting dinner soon.

I've also been fiddling more with Wikindle. I solved the issue of needing to find new articles to download. First off, it now can take in a list of page categories and pull all articles in that category. The goal is not to recreate Wikipedia on my local machine, but I do want my corpus of articles to be large enough that it covers the "normal" things people look out for. I also don't want bad articles, so I'm currently limiting all categories to be ones which are maintained for quality by Wikipedia.

As I write this, it's in the process of making the pull. We've ballooned from the 8000 this morning, to pulling almost 55,000.

Currently it is pulling from four categories to get that number (well, aside from the extra 100 it is pulling for being popular.)

The download process still has work to be done. I'm still not getting images from articles and I know some things are not translating smoothly, especially in the math sections.

The next action items as I see them:

First, figure out images. I'm not sure where they are being filtered out of the text, and then I need to be able to pull them down and convert the tag to work with the modern day markdown encoding for it.

Second, I need to dig into other conversions from html to markdown and look for other articles or issues with import.

Third, I want to also identify categories of articles I don't want. For example, I'm not going to go to this document for information about state roads in New Jersey (which is currently in the corpus.) So I'll need to add document filtering and a blacklist of articles so it doesn't get re-added.

Fourth, re-add cross linking via markdown/wiki text for articles which exist in my Wikindle.

And lastly, once this is all figured out I will need to figure out the whole "putting it on the kindle" or some other similar long-lasting device. The real nerdy thing would be building my own e-ink device or something. We'll see.