Wednesday, April 4, 2018

Macross news in RSS

What I did

An RSS newsfeed you can use with your favorite RSS reader to checkout Macross related news (it's all in Japanese though... based on the site Macross Portal (News))

RSS feeds



Source

https://gitlab.com/rikijpn/macross-news-rss
( I deleted my github account, f*ck microsoft!)

A bit about RSS

I love RSS.
I read lots of RSS feeds with emacs gnus, together with my mail at work, and it's just so much better than having to go to each website to look for news and stuffs. Due to my Free Software beliefs I don't use SNS/microblogging services very often.
You could even use it for twitter and stuffs, but I'd rather hope someday there will be a world where our main SNS services will be decentralized as diaspora... sorry, that's another topic.

A bit about this code

I don't know why I even bothered using that lxml library... it'd have saved me so much time to just write my own xml-like class. But in the long term I guess lxml is a lot better to maintain real (large) xml.
RSS isn't really made of a super complex XML, but a rather small, with a lot of commonly used fields defined one.

The simple version is my starting one, which is good enough for small things. But this site usually has some pics I end up wanting to see, so clicking on the "link" part very often. I'd stand that for RSS made by the official site, but since it's my whole thing, I decided to add a "full" version, with the complete page as the "contents" of each news, so I very rarely need to actually access the original site anymore, yay.

I put the scripts on cron to run daily, and to do the sync with the web server.

The hardest part was knowing how to write RSS... since apparently there are lots of things that aren't defined-per-se, but are common used like that, etc.
I ended up mimicking some site's RSS feed. Which gave me the most important hints I needed:
1. You need to use the "" tags instead of "" for CDATA (html code) including source.
2. CDATA let's you put HTML! Copying sections from web pages becomes a lot easier that way...
3. You need to declare the "encoded" Namespace (some kind of XML terminology I still don't really understand, but thanks to the lxml library I had to read a lot of in order to just make those little content:encoded tags!)

That's pretty much it.
All the rest you can just read in w3schools. They even have an XML namespaces section!

For the simple version you don't even need more than what w3schools has in their tutorials.


The code is a bit messy (especially the "full version" one). But basically I have one class to fetch the data, and another to make an RSS(XML) object.
So I fetch the main news titles page, and make an RSS feed out of it, voila.

In the "full version" one, as I'm fetching not only the titles, but the contents of the pages behind those titles, I'm saving the data every time, so only new titles' sub pages are fetched and not all pages every time.
Plus I added a description fetching class pretty similar to the one for the titles, but that gets the main HTML part I want to have as contents, and after checking the new news have no contents yet, add them there.


That's it

Well, I hope I never forget about another concert or new Valkyrie sales after this. You're welcome to use the RSS if you're a macross fan too.
I'm kinda regretting not having used lisp to write this thing since that would have probably been faster... but I guess I should be able to write some python too and all.

No comments:

Post a Comment