Posts Tagged 'hack'

Shortener update

posted by robert
Jan 17

So for the last few weeks as part of my learning python journey I've been knocking together bits of a url shortener service which is complete and functional. I've just been toying with things and basically learning through trying, and a bit of benchmarking I guess. As part of getting it ready to deploy (on app engine) is to find a name. Obviously a short name. So I came up with quite a few that I really liked, two were pure gold and another not far off. Of course the domains were squatted on so I had to move on and keep thinking. That really gives me the shits. Two of them even didn't bother to have dns hosted. If you buy a domain at least have the decency to host the dns and redirect the site to something.

Perhaps the flip side to all this silliness, is I still think url shorteners are stupid. Remember goatse or more recently the rick rolling phase. Link shorteners were another way of fooling people to visit links they'd otherwise know not to, purely by them looking at the url before clicking. By using link shortening services you're taking away someone's ability to control where their browser goes (unless they know that services way of previewing), and worse, you're making a statement that you know where they want to/should go to for them. As for reducing typing at the expense of communication speed it's even more terrible; big a little b number 7; bloody slow as if you ask me. If someone is linking to your site it should be more accessible than that, they shouldn't need to make short links. I realise some will use analytics off their shortened links, or they're shortened for a reason like twitter and fitting into 140 character messages (ug). The other reason why they're stupid, which was highlighted when various url shorteners closed (trim, cligs) is all the broken links when the company goes bust. The Internet Archive is keeping a copy of them for some of the shorteners but not all, and without the domain name it'd still be a broken link. This is not a new problem, but nor are url shorteners either. As most of them introduce a single point of failure, it's good to see google offering one too. Theirs performs well and has a pretty awesome api too. Seems like it's now a solved problem. Perhaps I should have picked another very simple web service to build instead.

As for trying to build an app using new(old) technology to solve a mostly trivial problem while learning a new language, even that was slightly miss aimed. Most of them seem to just use standard technology with some decent caching on the front (which is an obvious performance boost). Example: until very recently is.gd used php+mysql for it all, though now that page has moved (thus archive link) and their replacement page says it's mysql+mongodb+php, which is where I was going (though using python instead of php). Based on the sheer number shorteners out there, I'm not the only one who knocked one up as something to do.

So that's sort of on the chopping block before leaving the starting gates – unless I think of a cool name soon.

Completely unrelated to that, I called up telstra recently to activate a prepaid mobile broadband service. I'll start by saying they haven't changed anything since the last time I called up to activate a prepaid wireless broadband service. Changed n-o-t-h-i-n-g. Nada. As I knew it would be a long and painful call I called from a land line (non telstra). After being probed for enough id to get a car loan and supplying an existing customer account number (in matching name - yes it was my already existing account) I was asked if the service 0400 xxxxxx was my land line phone. Wow. I mean, wow. Not only is that totally unrelated to the purpose of the call, it also showed to me that the person I was speaking to was clearly not in or from Australia (for the overseas reader all mobile/cell phones here start with 04xx). The script being followed was just about read out word for word too. What a pain in the ass. It's not like I'm phoning up with a complex problem, I'm not requesting the plans to build a drug lab or nuclear warhead delivery system, nor am I asking for large prime numbers to be factored. Why is this so difficult? That's even beside the point; the prepaid service doesn't need anything account wise on their end. If you run out of credit it stops working, you can use the service to recharge. There's no line of credit there, so why is it so hard. I'm sure they just want a name attached to the account should you do something dodgy and law enforcement comes a knocking - what other reason is there for them needing to know who's on the end of an activated service, it's so the man can track you. Obviously no burners or trac phones for us. So we'll have to make our own.

A furry pussy!

Clearly the man is out to get us, all of us, even if we've done nothing wrong.

Billy said something about needing more pictures, and I completely agree.


Dec 07

So I moved some PC's around the house to solve various problems, and managed to get xbmc working sweet on the ASRock ION 330 box. Even 1080p playback with digital audio pass through working, all on the existing Fedora install.

Then I proceeded to make a cut down installer, with the goal of at least putting a smaller disk in the machine, if not running it from a flash drive. Sure I could use a live distro for it, but I'd like to run a few other small apps on it and not have the overhead of distro jumping. Also, if it's binary compatible with my desktop other options open up too. No, I don't think compiler cache on an atom is a good idea.

So I booted up a minimal install (via kickstart) of fedora, with a few select packages added including xbmc (and thus the external repositories) with the massive assumption that the dependencies are setup right. Well normally it's ok, but in this case I ended up with a gdm with no fonts and xbmc segfaulting on load. I'm not even sure if any fonts were present on the box at all, and adding some specific font packages didn't make the boxes go away, though I was just guessing. Eventually to cut a long story short, I did a yum groupinstall gnome-desktop and saved a copy of the packages it was going to install (just in case) and that resulted in both gdm working right, and xbmc working.

Not liking that solution I methodically went through the list removing packages until I found the ones that broke it, urw-fonts for gdm and pulseaudio-module-x11 for xbmc's segfault in libGl.so. PulseAudio caught me by surprise, because to get pass through audio working initially I'd removed ALL pulseaudio-* packages. Xbmc uses alsa for audio which can do digital audio pass through.

Since then I figured, why mess around with gdm and so on, just replace gdm with xbmc and you're done. So off I go on a quest of upstart knowledge which to date I've not had to deal with because everything just works.

What can I say. I'm totally amazed how this got into any distro let alone fedora. There is no single source of complete documentation at all. There's a getting started guide which is ok, but nothing beyond that. The wiki's pages are a boiler plate joke and appear all out of date. The source package had a diagram in it for event state transitions, which was nice to see, but there was no other useful documentation in there at all. Even more amazing was the lack of safe way to disable a service; either rename it's .conf file or alter it's start stop methods to add keyword never (of course there's no list of keywords documented anywhere I could find). Holy frozen shit on a stick. Fedora's site even listed documentation as a requirement, which was marked as fixed, but didn't link to it, nor could I find it. Either way, I managed to make a service (if you could call it that) which could start and stop xbmc on instruction from initctl. But I couldn't get it to start at the end of the boot process, even using the same triggers as gdm was using (with gdm disabled of course). Even worse, while playing with these start/stop event triggers I managed to get into the situation where reboot didn't work until you did it a second time (via ssh – as the local console was shutdown by that stage). So I bashed on and on battling it, not really making any progress, tried various ways of doing it, moved to a sysv style script which I got working much faster (and correctly starting at the end of the boot process) however xbmc was not able to output any audio at all.

To cut another very long frustrating story short, I went back to gdm with auto login, and it's auto login delay seems to not be able to be set to zero. I was playing with using slim instead which could do that, but still couldn't get audio.

Honestly I'm not sure what's going on there, it doesn't make any sense, but my time is worth more to me so for now that will do. Though I might make a list of various combination's to try if time allows.

So in cutting yet another story short - as this post is now well overdue - I've produced a fedora 14 kickstart installer that is quite minimal, blows the whole disk away and installs xbmc. On bootup it auto logs in and even grabs my xbmc configuration during the install - so on it's first boot up, it boots up into xbmc with working auto mounted NFS shares and digital audio out. Win. With a bit of polish and some changed password hashes I'll upload it to share.

This post was going to be about something totally different but also broken. Maybe soon.


Zfs experiment continued

posted by robert
Jan 18

So the zfs experiment continues. Upon the release of b129 I set off into the unknown on a voyage of dedupe. Which at first had the promise of lower disk usage, faster IO speeds and a warm fuzzy feeling deep down that you only get from awesome ideas becoming reality. ahem

Most sources say you need more ram, and that is true, what they don't say is how much ram for what size data set, which might be more useful to home users like me. My boxes have 2gb of ram each, and that is not enough for dedupe, no way near. Not if you have a 6 TB of randomish data. I might retry when I get to 8gb ram but not before. You see, if it can't keep the whole of the dedupe table in ram ALL the time, any write to a dedupe enabled volume will result in reads for the rest of the table, or at least seeks. So what I saw was a gradual slowdown while writing to the volume, I was determined to let it finish, to see what savings I would make, and then scrap it due to performance, but after waiting 16 days for the copy, I cancelled it.

The only way I found to even see the contents/size of the dudupe table (DDT) is: zdb -DD which results in an output like this

DDT-sha256-zap-duplicate: 416471 entries, size 402 on disk, 160 in core
DDT-sha256-zap-unique: 47986855 entries, size 388 on disk, 170 in core

DDT histogram (aggregated over all DDTs):

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    45.8M   5.69T   5.66T   5.66T    45.8M   5.69T   5.66T   5.66T
     2     394K   43.0G   40.3G   40.3G     821K   89.0G   83.0G   83.1G
     4    9.90K    527M    397M    402M    47.0K   2.35G   1.76G   1.79G
     8    2.06K    125M   82.4M   83.4M    21.1K   1.20G    795M    806M
    16      391   13.7M   8.54M   8.76M    7.26K    272M    162M    166M
    32       69   1.17M    776K    822K    3.08K   51.3M   32.7M   34.8M
    64       17    522K    355K    368K    1.43K   36.9M   25.1M   26.2M
   128        6    130K      7K   11.2K    1.07K   31.3M   1.50M   2.23M
   256        2      1K      1K   2.48K      833    416K    416K   1.01M
   512        4      2K      2K   4.47K    2.88K   1.44M   1.44M   3.32M
    2K        1     512     512   1.24K    2.79K   1.39M   1.39M   3.46M
 Total    46.2M   5.73T   5.70T   5.70T    46.7M   5.78T   5.74T   5.74T

dedup = 1.01, compress = 1.01, copies = 1.00, dedup * compress / copies = 1.01

Saving's of around 80gb with dedupe and compression (backup box so no real world performance requirement) is just not worth the need for 3-n times the ram and possibly an ssd for the l2arc cache to speed things up. Yep, the suggestion and observed behaviour was to hook up a cheap small (30gb) SSD for cache to accelerate it. I don't mind that so much for a primary but this is my backup/2nd copy box so it's not really ideal. Certainly not for 80gb of savings, or at current prices around $5 of disk.

My second attempt is now underway, this time I've sliced up my data sets into more volumes, and by more that means smaller average size, so this time around 2TB max per volume, which from experience at work I've learned is a good rule of thumb. So now I can enable compress+dedupe on only specific bits, hopefully where the most savings is to be made, and then the rest is just stored raw. This way the savings might be similar, but without the major write speed penalty. I've also realised for the production box if I want screaming performance, I'll throw an ssd on there, but that means more sata ports, which means a major change. I also need to work on power management too.

One thing that has gone right this time, is I'm now using CF->IDE adaptors and booting off that. This way the OS think's it's on a 2gb hdd, so booting doesn't have the complexity of usb boot and also uses less power and doesn't take up a sata port. Of course new boards don't have pata anymore so I might need to get a CF->sata one in future.

Another thing that must be said, Solaris's CIFS server is fast.