Another heatwave, the temperature got up to over 80 degrees this month. The flowers are rioting in the garden. Ah to be in England, now that summer's here!
Galleries added this month.
Stories added this month.
Movies added this month.
On July 10, 190,000 pictures came in, over 20 gigabytes. This bundle took 21 hours to process. July 26 saw 200,000 pics in 22gb. That took 22 hours to process. But I'm making progress on the hardware upgrade; see below.
A new area - Download a full video. Some of the same videos you've been buying for $30, $40 or $50 on tape or DVD, are now being offered at a fraction of the cost, via download. No shipping/handling to pay, nothing arrives though the post. You download it to your computer, and then you can play it as often as you like.
So far, we have three Annie Rivieccio videos, and the video of the DtV-sponsored FVF physique competition at the 2003 Arnold Expo.
The Man from the Electric Company arrived, after a couple of false starts. I'm now on three-phase electricity, which means that instead of having 100 amps, I have 300. I was using about 50, so although I didn't have a dire need just yet, I probably will in the future. Now I need to dig a trench to lay a cable out to the Valkyrie Data Shed.
The main thing this month, was building the new servers for the Newsthumbs. Instead of one server digesting the news and creating the thumbnails, I'm going to be dividing the task among four, so it should take maybe 1/4 as long as before. With the current processing (a 2.4 GHz Pentium), it takes about 15 hours per day, on big days that can go up to 20 or 22, and on one day in July, it took a few more hours than the 24 that is all you get in each day, so the processing couldn't actually complete that day. Instead of the 2.4 Pentium, I'll be using a 3.2 Pentium. Plus, that will be running an 800 FSB with PC3200 memory (faster memory should mean faster processing). Plus these new Pentiums have 1mb cache instead of 512kb. Also, I'm using a new motherboard, that lets me put 2gb of memory on the server instead of the previous ones which had 1 gb. So, I'm hoping that each machine will be 30% or 40% faster than the ones I've been using.
So the servers will be 30-40% faster at processing, and I'll use four servers to do the processing instead of the one I was using, and I'm hoping that the daily processing time will drop substantially. How much, I don't know, because speeding up the processing might reveal some other factor that contrains the run time. But I'm kind of hoping that the day's processing can be completed in six hours.
I needed four servers to work in a team, so I built eight. The two teams-of-four will work in parallel, but if the servers work normally, you'll never use the second team-of-four. On the other hand, if one of those four crashes, I can switch over to the other four immediately. I'll also have a few spare servers, so that if the crash is unfixable, I can swap a spare server in to the team-of-four, without having to trek down to Watford. Thus, I'll still have my backup team-of-four.
The hard drives in these boxes are four 300gb and one 120gb. My original idea was to use the 400gb Hitachi drives that have just come out; serial ATA, so the cables are smaller and neater. Slightly faster, too. But after two weeks of waiting, my supplier couldn't get hold of any. So I went back to good old reliable Maxtor, and bought 33 parallel ATA 300 gb drives (32 for use and one in case I needed it). The 120gb drives hold Linux, and the swapspace and I don't know what I'll use the other 100 gb for, but the 120s are just a few pounds more expensive than the 40gb (currently the smallest you can buy) and take up the same amount of space. The motherboards are Asustek P4P800VM, which means they take PC3200 memory, and have video and Lan onboard. They can take 4 memory sticks, which is how come I can put 2gb of memory in. I add a 3ware raid card to control the 300 gb drives, and I get 1.3 terabytes in a 1U box. Eight of those, is just over 10 terabytes. Hefty.
In building the servers, I had a problem - the power supplies are really for Pentium 3, and when I used them on a Pentium 4, they could only support two drives. So I took the power supply out of an old 1U server that I wasn't using, and tried that. To my surprise, it worked! So I replaced the other three that were giving the same problem, which gives me something that works until I can buy new power supplies. These 1U power supplies aren't easy to find, and are quite expensive compared to ordinary ones.
I'm using Red Hat on all my servers. But Red Hat seems to have split into Red Hat (Enterprise), which is for people who have lots of money and want to pay for support, and Fedora, which is for people like me (careful with money, and I don't need support). So, I'm now installing Fedora Core 1 Core 2 is available, but I can't actually work out how to install it from a floppy disk, I'm hoping that will be fixed in the next version, or else I'll put some time in to work out how to do it with this new thing.
One of the key differences in versions, turns out to be the handling of large files. By large, I mean more than 2gb (or maybe I mean more than 4gb, but it doesn't matter, because one of my files is 5 gb). Core 1 and the other parts of it that I use, seem to be able to handle large files. Older versions of Redhat did very queer things with large files.
Part of the Newsthumbs upgrade, involves me tracking all the files. Currently, there's 100 million picture files on the servers, and to track their name and location, I store a 50 byte (on the average) record for each one. That means 5 gigabytes. Finding a database that can handle that, wasn't easy.
I tried the Berkeley DB, which apparently is fast, and extremely widely used. The problem I ran into was that to load up that database with 100 million records was taking a very long time. Like, weeks. And once it's loaded, you can read from it, or you can write to it, but not both at once.
So I tried an SQL database. MySQL comes with Linux, and has rave reviews. Plus, I'll be able to write to it (to update it) while it's being read from. But. Creating the database of 100 million records was looking to take weeks, and if bombed out after ten days.
Yes, I daresay that I should be running DB2 on a mainframe, or Oracle, or something seriously expensive like that. But to try it, I'd have to buy it, and my experience of this so far tells me that these things aren't designed to handle large databases. By large, I mean 100 million records, and by "handle", I mean "load up the data into the database in less than a week of continuous processing".
Oh well. So, I did what any sensible person does in this situation. When all the wheels are square, you have to reinvent the wheel and this time make it round. I wrote my own database. Loading 100 million records into it takes several hours (tedious, but that's beeter than several weeks). Now I just have to work out how to make record retrieval fast. I can currently retrieve records at the rate of 15 milliseconds each, which is comparable to the speed of a disk access. But to get things set up for that takes 15 seconds. So, I needed to work out a way that the setup only happens once, and everything after that is fast. I'm working on it, using mod_perl, which I've never used before, but which looks like it will give me that persistence of setup. So far, it's looking good; I can use my browser to bring up pictures and the loading is as fast as it would be if I weren't using the database, just loading the picture directly. That's because a picture might be 50kb, and even if you're using broadband at 500 kilobits, that's about a second to load the picture. The extra 15 milliseconds just isn't noticable.
Oh. I forgot to say why I wanted to use a database. The thing is, there's currently about a dozen "older servers", and the way things are going, I'm adding a new one every six weeks. This means that the old newsthumbs pictures are divided up amongst them. For example, you can't search them all at once, you have to search one server at a time. When there were only a few servers, that didn't matter so much, but by the end of next year, there will be a couple of dozen older servers. So, I want something that lets people view all the older servers as a single unit. The database tracks which picture is on which server, so that all you have to know is that you want to look at Anita Ekberg, and you don't have to wonder which of the dozen servers to look at.
I'm making good progress on this; I can already do a search across all servers. And I'm pretty sure that this is all going to work eventually, I can't see any unbridgable gaps. But it isn't ready for real use yet - I have to get everything working properly before I can let folks use it.
The above details give you a pretty good flavour of what this is all about. I concoct great plans, but they fail for some reason of detail (400gb drives not actually available, power supply doesn't work as expected, database software impossibly slow) and I have to work out a way around the failure. This is called "bodging". You can call me a bodger. Bodging is a lot of fun.
TomNine announced that he's just bought a great new camera - we can expect to see lots more stuff with the TomNine imprimatur.
I don't make these up, although the comments on the spams are mine, of course. These are actual spams sent to me, which just strike me as funny. I don't include their contact details - go find your own spammers!
By the way, if you're using StoneColdMail
(which is free to web site members) then you won't see most of these spams, they'll be delivered
into your "Spam" folder.
This email was sent by the Citibank server to verify your E-mail
address. You must complete this process by clicking on the link
below and entering in the small window your Citibank Debit
Card number and PIN that you use on ATM.
This is done for your protection - because some of our members
no longer have access to their email addresses and we must
So let me get this straight. I reply to an email from an unknown person, and I give
you my credit card number and PIN.
MAMA went from 3 to 13 in 2 days here is my next one!
And wow, was she surprised!
CONGRATULATIONS YOU HAVE WON $500,000:00!!!
What would your family do if you died?
We've sponsored lots of the women; Nicole Bass, Andrulla Blanchette, Sheila Burgess, Christine Envall, Marilyn Perret, Peggy Schoolcraft, Larisa Hakobyan, Steph Parks.
We're also sponsoring individual events such as the Femsport Valkyrie Festival, and the New York Muscle Club, and funding athletes to go to events with grant dollars.
We're also doing free hosting and free bandwidth for many of our sponsored women. Bandwidth can mount up to a large bill when you're running a popular web site.
And we've sponsored Heather Foster, Kara Bohigian, Priscilla Ribic, KerryAnn Allen, Linda Cusmano and Jodi Miller. Also Anita Ramsey and Rhonda Dethlefs.
Nicole had a "Grand Mal" seizure a few weeks ago. When she came to, at first she couldn't remember who she was or where, and she hurt her head when she fell. She was taken to hospital, and stayed there for a while, but now she's out and keen to get back into training.
Vilma Caez is now a member of the Family
Pam takes the lead from Boomer, but not by much
I've put up a new style of board; it includes threading, the possibility of moderation, and lots more. You can see what Tom Nine's Tussling Tenement looks like, we gave it a new coat of paint, and it looks great. Mixed wrestling sessions is a popular topic!
This month we had 3073 posts to the boards.
Most posted Board of the month
Poster of the month
|Lots of great discussions going on the Politics board; the upcoming elections (they have a big election every four years in the USA), Iraq and Israel. The second most popular is Boomer's sports chat. I take the view that if we didn't have this board, sports chat would still happen, but it would be on several of the other boards.||HomoAncient is again this month's number one poster.|
Mavis is counting the number of times the message list is checked for each board. This gives a very different picture from the one above.
Most listed Board of the month
Most read Board of the month
|The FBB board excites the most interest, followed by Fistman's Photos.||The Grinch got the stats.|
Newsthumbs reached 100 million pictures! If you look at one per second, every day and every night, it would take you over three years to look at all of them. And by then there would be a couple of hundred million more.
I checked the site statistics that Sandra counts up each night.
At the end of July 2004, there were about 719,000 pictures (46 gigabytes), 153 gigabytes of video, 8500 text files (mostly stories) and a total of about 199 gigabytes. The Current Newsthumbs has 3.6 million pictures; there's about 100 million pictures altogether in Newsthumbs. How many web sites do you know that have 100 million pictures?