No More Google Analytics And Upcoming Coding Plans
Google Analytics is mainly for my own validation at seeing the handful of people who come by. But I removed it, and am instead testing a more privacy friendly alternative called Clicky.
I am taking the next week off of work. I am not yet sure my coding plans for that week. For Glowbug this blog's backend system), I have three things that are sort of sitting front-of-mind. Automated email newsletter generation, archive pages for tags (if you wanted to see all the entries I have about NASA, for example, you can't currently), and making the publish functionality smarter (currently when I make a post, the site rebuilds everything. I think I can do better than that.)
I also have a few other coding projects in mind, one new one which I've codenamed 'BEHEMOTH' - But, it's definitely named that because I think it will turn into a big project and I'm not sure I'm ready to tackle it yet.
Edit:
A few more ideas / thoughts while I was making quiche this morning. I wonder if I should add the ability to thread posts? For example, I could do a post about the Supreme Court in the morning, and then later the day thread a link to another post under it. Or for example, this edit would be its own post and would instead thread under the above post. Not sure it's actually better, and I would need to figure out how it is handled for things like the RSS feed, etc. But an interesting thought.
It would also be interesting to thread under a post from a previous day, etc. But again, how is that represented / displayed?
Also, while I like the backend's Markdown text editor I use to compose entries, it does not work well for me on mobile. I need to decide what I'm going to do. I am toying with the idea of making a simple mobile app that lets me post to the blog, primarily as a coding exercise, but also it would allow me to more easily share links to the blog from other apps. Where now the workflow is rather convoluted.
Question I woke up pondering today: How can I measure the popularity of words or phrases comparatively through third-party tools? I don't want to build a massive database and derive it myself. I am looking to find smart ways to use tools that already exist.
The first use case I'm thinking about for this is, again, the automated tweets I put out about what I'm writing about. I used very simple logic to determine which tags it chooses to use as an example: the most frequently used on the site. The logic being that that means they are things I (theoretically) write about more commonly.
But, if my intent is to use those keywords to draw people's attention, and in light of the frequency with which I'm writing about current events. Having the ability to check if a keyword is "hot" right now, would be useful for the automation.
I have two thoughts:
- Twitter's Trending Data - I can ping the Twitter trending API and see if any of the keywords appear as part of the trending topics
- Google Trends - It looks like there is an unofficial API for Google Trends which would also prove useful in the comparison weighting
So, at some point I'll dig in and play around with these.
Adding Screenshots to the morning TrickBot Tweet & Talking about Tags
Alright, wrapping up my coding for the night. Spent tonight finally figuring out how to post images on tweets via the API. Specifically so that the morning TrickBot tweet, updating followers on my posts, would include a screenshot of the top of the frontpage.
I've had the screenshot stuff worked out for a week or so, thanks to screenshotmachine.com. They have a free API which permits 100 calls a month. And since I am planning to do it once daily, the math there works out in my favor.
I have been using the old TwitterAPIExchange PHP library. I can't say definitely the issue I was dealing with was because of it, but after I switched to the http://TwitterOauth.com library, it was a fairly quick change over and it worked perfectly almost immediately.
I think the next bigger project for me with Glowbug is to really give the taxonomy and tags a rework. What I have works fine, but I think it can be better. A few ideas / questions:
- Do my tags need hierarchy? If they had them, would I add parent tags when a child tag is picked?
- Should I tag a post if the term is in the body or title of the post? I think yes, if I am going to create tag-based pages. However, if not, then tags purely serve search functionality, and the answer becomes yes.
- I need to completely rewrite the taxonomy management page in my admin tools. It's awful but functional. I want it to be better.
- Need better identification of duplicate tags and handling/fixing of them.
- And of course, I need to set up archive pages for tags
All that said, these are going to sit on the task list for a while. I have two new bigger ideas in mind. I'm taking the week of the 4th of July off, and while I plan to spend much of it with family - I also intend to dive into a new project.
Edit: I just thought of another feature I need to add to the automated tweets. I need to thread replies which link to posts I make which have direct links. I don't do this often, but usually for longer posts. Adding it to the backlog.
Thinking about my coding projects to work on. Just in the order they're in my head, not necessarily in the order I'll tackle them.
To Listen
This is the new script I hammered out last night for taking a YT playlist and making a podcast. It still needs more work. So I want to hammer on it a bit more, smooth it out and fix some bugs with it.
Glowbug
There are some quality of life bug fixes I need to do. I also need to finish implementing 'Chapters' - though, as I did it, I began questioning if it was something I honestly needed.
Pick'em
This is the most pressing, because right now it isn't sending emails. I had been sending emails via Gmail using SMTP, but Google has closed that functionality. Ostensibly they did that for security reasons, but honestly I think they were just having too many people like me using it programmatically. So I need to investigate solutions.
Kontakto
I also started a very early version of this. A web app designed to be used on mobile. I need to get back in it and finish the functionality. Even though it's a web app, I think I have a way for it to send notifications to my phone.
Clerk 2
Not the Kevin Smith movie. But I have a simple app that I call Clerk which I use for tracking weight and some other daily stuff. I am planning to rebuild it from the ground up using the PHP Laravel framework. It's, overall, a pretty simple project and I think it would be a good project for getting used to the library.
Next Project
I think the next "big" project from the ground up will be the collection manager. I've been thinking a lot about it recently and really trying to delve into the database design as that is going to be critical to this project.
I don't know how much I'll get to work on projects this weekend. I'm hoping 8-ish hours in the morning and evenings. We'll see.
Today is a Tuesday which VERY much feels like a Monday. I have come to resent short weeks as I find when Monday is a day off it ends up making the rest of the week feel rougher.
That said, today I've made the decision that this week is the last week I can let myself focus on Glowbug as a project. I have a number of ideas and things I want to add, but ultimately it's at a solid point in development and I want to move onto other projects. I'm not sure which one I'll go to next, but it will have been almost 2.5 weeks of me focusing on Glowbug and it's time I move to another project.
As is tradition and simply part of my morning routine, this morning has been spent with a bit more coding.
Today's coding has been mostly cleaning up code. I moved every variable, or API key, etc. from their place in code and moved to a new file which is not stored in GIT. I should have done this a long time ago, but oh well.
Last night I fixed the code for publishing the blog. When I first wrote it, I used entirely relative server paths. It worked perfectly for normal publishing. But with my recent updates and changes, I was trying to publish from a different script and it was dumping all the files in the wrong location. Now, that is fixed.
If it works, this embed is going to not work:
Also this morning I'm attempting to update my twitter bot code. Primarily, the intent is to delete my previous automated tweet when I make a new one. It should be set up to do it. But we'll see when the cron runs in a little bit.
HTML to Markdown
We're officially a Markdown blog! For those unfamiliar with Markdown, it's a 'markup language.' That is, like HTML, it defines how a document will look - using a very simple system. I'll dive into why I'm doing this later in the post. For you all, you should see (almost) no change on the blog, most everything I discuss here is relating to the backend management of it.
A few days ago, I quickly worked out the code which would take every entry in the blog and convert them from HTML to Markdown. I also took the opportunity to fix some content errors in the site; though it's not all fixed. Before running the conversion, I did a lot of testing to be confident it wouldn't screw up the database - but, having done that, I still took an export of the database as backup before running it. You can never be too careful when dealing with something that could potentially ruin your website.
This morning, after the entries in the database were converted, I switched my admin interface to a different editor which supports Markdown. I am currently going with Editor.md which looks to be a markdown editor made for the Chinese market. It's quite robust, though I have a few things I still need to figure out. Such as, when writing this post, I went to use a modal and the UI popped up in Chinese! Whoops. Thankfully it was a quick fix as they provide a bit of code which fully translates the UI into English.
Once the editor was in place, and I had confirmed all of my normal admin functionality worked, I then finished implementing the publishing which would turn the Markdown into HTML. Thankfully the Parsedown library for PHP is robust and simple. It only took a few lines of code to implement it.
That was it. Overall, very straight forward, and now my blog is future proofed. The future-proofing is one of the major upsides of using Markdown.
The editor I was using before was called Quill, and while it's a very powerful rich-text editor,the HTML it created was not perfect. This isn't a problem limited to Quill, many rich text editors struggle with this. I still remember the nightmare code that Microsoft Word would produce if you saved your document as an HTML file. So, with Quill, I was having to do code cleanups and figure out workarounds for some of its quirks.
Ultimately, this is not a huge thing. Many other blogs store their posts in HTML. I just like knowing it is in Markdown for flexibility. And, making Markdown the core structure of the backend enables me to potentially do some other things for inputting new entries. Dave Winer, the father of RSS, has a neat thing he put together where he can tweet a thread in markdown and his blog will pick it up and import it as an entry. I don't currently plan to do that with this blog, but it's an example of where making Markdown the editorial core enables some greater flexibility.
Glowbug coding morning
I have, for the past 3 days, been banging my head against the wall.
I utilize an app called 'Wallabag' which is a self-hosted version of Pocket, for storing articles for later reading.
The issue I'm running into is that I want to hook a bit of code I'm writing up to my Wallabag. Wallabag has a built-in API setup. It's documented and works for a number of things already. I'm attempting to create my own. But, for some unknown reason, the path /oauth/v2/token, which works on their hosted setup and for other people who are self hosting, is not working for me.
I've been working through possible reasons. I even contacted my webhost to see if there was possibly some sort of intra-server limitation that would stop it. But, no luck.
My current theory is that there is something messed up with my install. I installed it via 3rd-party tool called Scriptaculous. So, in attempting to fix that, I go about installing it via their prescribed way of doing it. And, because this problem has become a huge constant series of obstructions, I end up with errors as part of install. Fun!
When I ran into the wall on it, I turned to some other bits of coding. Namely, getting me closer to the transition to Markdown rather than HTML on the backend. I have just finished the script which will convert all the posts from HTML to markdown, as well as fixing some other issues.
Tonight I'll finish adding the Markdown parsing to publication and then change the admin entry editor from Quill to a Markdown one. Once that's done, we'll run the conversion.
Upcoming Glowbug Work
Now that I hit the main milestones for Pick'em, I have spent much of the past week or so working on stuff for Glowbug. It's really a tool for me. I've talked about the work I've already done. I'm listing these in no particular order:
Add ability to set custom URL path rather than use the hash - Some entries on this blog get their own page, which have a direct link to them. The vast majority of posts here don't and are only accessible through the day archives. But, up to now, if a page gets its own URL, it's a generated hash. When I started this blog engine, I didn't care about the URLs and just wanted it to be quick and easy. I've reached the point now where I'm ready to be able to add a bit more curation and want to be able to manually define a page's URL path.
Add Chapters for posts - This is an old old idea I've had for blogs. Especially blogs which are life journals. Having posts organized into chronological groupings automatically. More than just a day or week archive. But you would basically say "Chapter 1" or "Seattle" or whatever title makes sense at the time you start the chapter. And then posts would be added to it cumulatively until something occurs in your life and you turn to a new chapter. So, I want to finally implement that idea here for the life update and current event posts.
Combine multi line quotes - This is a bit of a smaller bug, but right now, if I input a multiparagraph quote, Glowbug renders each paragraph as their own quote block. Annoying. Also, not huge, and I might put it off as I plan on doing a larger rewrite of the post creation system. Moving away from the Rich HTML structure and reverting to a more simple Markdown based content entry basis. We'll see.
Add pages for tags - When I first added tagging to the blog, my intent was to go back and add logic for creating pages based on tags. I still want to do this. Though, I'll need to figure out how to handle the tags with a bunch of posts. I don't particularly want to paginate, but I might have to.
Make publishing smart by identifying all the pages which need to be updated - Another thing I've been putting off. Right now, when I post a single link, the backend rebuilds the entire site. That's silly and wasteful. I want to go through and figure out how to best identify the pages which are affected by this post and update them.
Fix Suggested Tags - The suggested tag functionality I currently have gets the job done, but it's crappy. I can do better. There are some bugs with it and I just need to sit down and take on fixing it.
Yesterday I polled my Twitter followers about whether the automated tweets updating them about recent posts:
[{embed}]https://twitter.com/trickjarrett/status/1528166144800018432\[{/embed}]
Overall the feedback was that they were fine. Which is better than a resounding amount of people telling me they were annoying. I did get a good suggestion to add a tag which people could filter out, so, I did that. Along with that, I improved the overall coding.
First, I fixed it so the grammar of the tweets will be correct regardless of the number of posts made, or tags used. Previously, on days with few posts or tags, would lead to some poorly worded tweets.
Second, I refined the system for how it chooses tags to highlight. It's still far from perfect, but it first looks at the tags used in the day and ranks them by frequency of use. If there is hierarchy there, it goes with it. If not (aka, the tags are all used once) then it falls back to ranking tags by their frequency of use across the entire blog. Still not perfect, but better than the purely random highlight of tags.
Lastly, I cleaned up and refactored the code. In fact, as I was writing this blog post, I had to stop and go back as I realized I could further improve the refactoring of the code.
Automated summaries and keyword identification from articles
Well, this bit of coding has been a journey.
This week I've been working on code which automates a pipeline from my Wallabag (a self-hosted web app that enables later-reading of articles online, like Mozilla's Pocket app) and my blog. While working on this, I discovered this GitHub project for 'TextRank' which takes a body of text and it will attempt to summarize the text, as well as identify keywords in the text. It is definitely not perfect, but it is useful for a first iteration of the concept.
I've been trying to integrate it into my code over the past few days to infuriatingly little success. This afternoon, I finally was able to get it - but only after getting on StackOverflow to ask about what I was missing. As I was doing so, I realized I had asked a question about the exact same issue six years ago.
I am thrilled to have found my solution, and mortified that I had forgotten about this.
So, the code now does two things:
First, it generates a summary for the text. These summaries will not always be great, but the hope is that they are a net value-add for this automation of the system. My intent is that these summaries will only be present until I come back and revise the content for the posts, either determining the summary is not needed or I replace it with something I write. We'll see.
Second, it does keyword analysis. I then take the top keywords it identifies and also keywords which already exist as tags, and add them to the new post. Again, not a perfect system, but better than nothing, and something I can iterate on.
Interestingly, I spent a summer in college working with Dr. Lonnie Harvel, during which I contributed to a paper he published titled, "Using student-generated notes as an interface to a digital repository." At the time, Georgia Tech, had just rolled out lecture recording and automated transcripts of the video with time stamps, etc. We were working on stuff that would further improve that system.
My main contribution there was work on code that looked at the transcript and identified keywords. It's been so long, I don't remember the full details of what I came up with, but I do recall it being something relating to a number of different things, like frequency of word usage in the text, word length as well as number of syllables (my thinking was that the bigger words would tend to be more important.) Granted, the context there was in identifying words that would do well in being sign posts for lecture transcriptions which is slightly different than identifying the most relevant and salient keywords for taxonomy.
In any case, it is interesting to come back to something I had done some research on back in college. I'm looking forward to seeing how the new implementation works on the blog and we'll see about improving and refining it from here.
Edit (12:29am): It took less than four hours before I decided to rework the system. I recalled there was a bot on Reddit which would pop up and attempt to share summaries in reply to links to articles. I tracked it down and found that it made use of another site that does summaries, smmry.com. After investigating, I found I could have the API use it up to 100 times a day, which should be plenty for my purposes. Their tool for providing keywords is slightly too opaque for my uses, currently, though I might reconsider and use it in conjunction with the current tool - though I'm not convinced that will be overly helpful yet. We'll see.
Glowbug + Wallabag
I just hacked in a new feature on the frontpage of this blog, I've integrated my Wallabag with this blog! Wallabag is a web application allowing you to save web pages for later reading, there is a popular tool called Pocket which does the same thing. The difference between Wallabag and Pocket is that Wallabag is something I can self-host.
The first level of this integration is that you can see what articles are currently sitting unread in my Wallabag. These are things that I come across through the day and I throw them in there to come back to at some point, maybe. I make no guarantees on the quality of them as they are often saved either based purely on the subject, or maybe by reading the first paragraph or so.
I have a few further ideas:
- I could use the 'Star' function in Wallabag to grab articles from my Wallabag automatically to be published here.
- A dedicated RSS feed for my Wallabag
- An automated newsletter for the links
As I posted this morning, I had began work on a feature for the admin panel on Glowbug. I now have a first implementation of suggested tags for posts on the blog. As I add a tag to a post the system looks for tags used on other posts which have that tag, and then suggests them as possible tags for the post I'm working on.
It's rudimentary, but it's a start. Eventually I'll improve the system with some smarter logic.
Integrating COUNT() into a nested MySQL query
I'm working on Glowbug's admin side. One of the features I've wanted to add is for it to recommend existing tags which are similar to ones I've already entered into a post. I'm working on the query which will pull this info up and decided to turn to StackOverflow to see if there is a better way to do it than I am currently. We'll see what the responses say.
Update 9:15am - StackOverflow comes through and gives me the answer.
Added audio functionality
[{audio}]8bit.mp3[{/audio}]
Among my development work today, after messing with Python I came back home to Glowbug and added some functionality that has been on my list. I can now upload mp3s and have it generate the play functionality.
This evening after I finished up work I sat down to tackle the final major component of the new tagging feature for the blog: a page to manage tags. This constitutes largely a page that lists off every tag, and gives me three functions:
- Merge - The ability to essentially delete one tag and re-assign all posts which used the deleted tag into the new tag
- Rename - Self-explanatory
- Delete - Self-explanatory
Knocked it out with about 2 total hours of coding for front and backend. Could have probably done it faster, but I spent some time chasing a ghost of a bug - that is, I got an error page and couldn't figure out the issue because I had forgotten to upload the updated code file. Whoops.
This morning's Glowbug coding is a long overdue feature: admin searching. This is for my use, I haven't yet decided how to give visitors search capabilities. I'm tempted to offload that to google, but that would be a stopgap solution. Eventually I'd want to give actual search I think.
But for now, I can now do searching based on titles, urls, and the body text. I haven't yet done searching based on the tags. That will be slightly more complicated. I might tackle it later today, we'll see.
Late night Glowbug coding. I have had this idea for a few days. On the back end I can save drafts of posts. Primarily it's a link queue saved either from a bookmarklet when I come across a link to read later, or a link saved from my phone.
Well there is a cron job which runs every twenty minutes that, as of now, does two things.
- If a draft is older than 48 hours, it deletes it. Sometimes I save a link meaning to read it and I keep putting it off. Well, if it gets to 48 hours I lose the chance. Or if I meant to save it longer than that it should be a browser bookmark instead.
- If the system hasn't grabbed the page's title, it goes ahead and does that. It's a small thing but it is much nicer than the sometimes opaque URLs which I don't know what I'm linking too.
Small coding thing, took me 15 minutes. The majority of which was triple checking my timeout deletion query, but otherwise I had it mapped out in my head and it went lickety-split nice and easy.
Javascript is a weird language and today as I was changing stuff on the admin side, I caused a bug in the pagination of the posts where it normally outputs a selection and elides the majority of the pages. Instead it was giving me all 25 pages. It was because I was grabbing the current page from the URL, so javascript was seeing the number as a string. One 'parseInt()' function later I've got it fixed, but it took me a while to debug this one silly thing.
Alright, I'm calling it a wrap on development today. Probably a total of 5 hours spread over the day. But tagging is probably 80% done. The biggest feature still to do is make tags have their own templates, so people could follow the "us politics" tag or something like that. Those get complicated because long term you could end up with tags with hundreds of entries. So I'd need to paginate them, which is more complicated.
I am also working on how I go about tagging, choosing keywords and ideas. I have decided I am not going to put people's names in as tags, but I will do it for other proper nouns such as countries, etc.
I wrote the code for automated tweets to grab the tags from all of those posts and take the top 3 tags to be mentioned in the tweet in hopes of making it more interesting to people and driving more engagement. We will see how it goes.
This morning's coding project is to finally begin adding tagging. I mapped it out in my head last night into this rough "to-do" list for the devwork as follows:
- Admin UI - 70% - Mostly done, spent an hour or so figuring it out and watching tutorials online.
- Backend - 10% - I have the table structure, now I just need to make it submit the tags and then the backend to process and tag
- Admin tag suggest - 0% - Suggest as you type completion functionality of existing tags, I conceptually know how to do this but haven't done any implementation
- Modify editing post to properly handle tags - 0% - More admin-side work, if I modify a post it needs to load that post's tags.
- Templating integration - 0% - Add tags to entries, and then add tag template pages
- Going back and tagging old posts - 0%
I suspect I might get to the third item this morning, we'll see. After doing this I think I will prioritize code cleanup and organization as things like the admin javascript are getting a bit unruly.
Update: Three-ish hours of work and I'm mostly done with the first four to-dos. Next will come templating it and then going back and tagging old posts which will be arduous. But this is a good stopping point.
As Glowbug continues to be my pet project, last night I coded in functionality to enable easy sharing from my phone. I use an Android device, so the workflow (for now) makes use of Tasker and AutoShare (a plugin for it) to call a custom endpoint on the webserver. It is still not a 100% process, as it turns out Android sharing is a pretty messy thing and is not standardized as nice as I would like. But I've got a working structure for now.
At some point I'll probably code a simple Android app that acts as a sharing bridge, solely existing to take the text and dump it into a webcall to the server. But for now, the Tasker + AutoShare functionality gets the job done.
