Forum::The Projects::RCN | BOINCstats/BAM!

Pages: [1]

Rebirther: BAM!ID: 3684; Joined: 2006-08-03; Posts: 51; Credits: 0; World-rank: 0

2006-09-09 11:59:34
last modified: 2006-09-09 12:00:02

Status vom Admin:
"Wir haben zwei Probleme, wir arbeiten dran und sind
frühestens am Montag wieder online"

"We have two problems, we are working on it and we are later monday online again".

Hope that helps for all users trying to upload their files as me

Cori: BAM!ID: 2; Joined: 2006-01-09; Posts: 1351; Credits: 69,052,230; World-rank: 14,743

2006-09-09 14:43:22

THX. ;-)

Greetings from Cori

John Hunt: BAM!ID: 5940; Joined: 2006-09-11; Posts: 79; Credits: 21,553,905; World-rank: 32,590

2006-09-12 07:02:55

Now Tuesday and still no signs of life from the project.
If they take much longer getting the project back up and running,
they run the risk of losing crunchers for good!

Rebirther: BAM!ID: 3684; Joined: 2006-08-03; Posts: 51; Credits: 0; World-rank: 0

2006-09-12 11:45:07

Any other reply from admin:
Take some more days, they hope this week, stay tuned.

Lee Carre: BAM!ID: 41; Joined: 2006-04-19; Posts: 262; Credits: 299,581; World-rank: 398,549

2006-09-13 12:01:13
last modified: 2006-09-13 12:05:26

Now Tuesday and still no signs of life from the project.
If they take much longer getting the project back up and running,
they run the risk of losing crunchers for good!

not to mention loosing work due to the problems of having about 100 WUs waiting to upload, and about 1400 transfers for RCN alone!

i'm becomming more tempted to cancel the lot, althought boinc will see to that itself after 2 weeks

John, i noticed you had your images in the the actual message
would it be possible to move your stats bannar to your signature? (look in ''forum preferences''

Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins

Rebirther: BAM!ID: 3684; Joined: 2006-08-03; Posts: 51; Credits: 0; World-rank: 0

2006-09-13 12:08:22

Now Tuesday and still no signs of life from the project.
If they take much longer getting the project back up and running,
they run the risk of losing crunchers for good!

Do you not abort anything. The deadline is not important, all work will be credited!

Honza: BAM!ID: 109; Joined: 2006-05-10; Posts: 154; Credits: 8,917,267,091; World-rank: 433

2006-09-13 13:37:27

i'm becomming more tempted to cancel the lot, althought boinc will see to that itself after 2 weeks

This concern is on the place. Sometimes 2 weeks are long period, sometimes short.
I remember similar problem earlier and I was able to manually edit this deadline, resp. make it another 2 weeks/rest counter.

Lee Carre: BAM!ID: 41; Joined: 2006-04-19; Posts: 262; Credits: 299,581; World-rank: 398,549

2006-09-30 17:34:15
last modified: 2006-09-30 17:45:33

If they take much longer getting the project back up and running,
they run the risk of losing crunchers for good!

Do you not abort anything. The deadline is not important, all work will be credited!

i didn't cancel anything in the end, everything was sent/reported within a few days after that

the problem wasn't with deadlines - it had already completed the work, so i wouldn't save any CPU time by aborting

the problem was with the fact that the boinc core client was using a significant amout of CPU time by having all these records to handle, 1400 transfers to keep retrying is quite a lot really, that was the reason behind what i said

but as someone is bound to point out, yes i know this is a problem with the boinc clinet, and not RCN, i appreciate the difference, but the result is still the same in the end

Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins

Rebirther: BAM!ID: 3684; Joined: 2006-08-03; Posts: 51; Credits: 0; World-rank: 0

2006-10-10 09:12:10

News about the outtage:

Die Kurzversion: Ein Hardwareschaden. Wir sind so bald wie möglich
wieder online, aber es wird noch mindestens einen Tag dauern.

Die Langversion für jene, die es wissen möchten: Gestern wie
angekündigt kurz runtergefahren, um ein Speicher-Upgrade zu machen.
Dann wieder hochgefahren und anschließend war der Rechner der Meinung,
ein degraded Raid zu haben. Wir haben versucht das zu fixen, was
mit stundenlanger Synchronisation verbunden ist (die schlechte
Performance habt Ihr sicher mitgekriegt). Dabei ist der Rechner
immer langsamer geworden und schließlich eingeforen. Neuerliches
Booten mit Raid war nicht möglich. Erste Festplatte deaktiviert,
gebootet. Dann nochmal ein Reboot, aber da ist der Rechner dann
gar nicht mehr hochgekommen. Sieht für mich nach einem Mainboard-
Schaden aus. Der Lieferant sagt, daß es einen Lieferengpaß bei
939er-Brettern gibt, aber daß er bis 15 Uhr weiß, ob er heute
welche geliefert bekommt. Jedenfalls wird der Rechner gerade
abgeholt, denn das Asus A8N-SLI Premium ist gerade mal 4 Monate
alt geworden, also ein Gewährleistungs-Fall.

Nachdem die Controller des Boards als erste Probleme gemacht haben
(haben wir in den Logs gesehen) kann ich noch nicht sagen, was den
Daten passiert ist. Im günstigsten Fall bekomme ich ein Mainboard,
stecke die Festplatten an und alles läuft wie gehabt. Im schlechtesten
Fall muß ich aber den Server neu aufsetzen und die Backup-Daten
einspielen. Damit verlieren wir nur einen Tag Arbeit, aber es dauert
dann länger. Je nach dem weiteren Verlauf haben wir jetzt also eine
Downtime von 1 bis 7 Tagen.

short in english:
Downtime 1-7 days because mainboard damaged. The worst issue is to set up the server completely new with a lost of 1 day of work.

PovAddict: BAM!ID: 115; Joined: 2006-05-10; Posts: 1013; Credits: 5,785,239; World-rank: 78,454

2006-10-10 14:48:50

News about the outtage:

short in english:
Downtime 1-7 days because mainboard damaged. The worst issue is to set up the server completely new with a lost of 1 day of work.

Thanks for the update.

LoneStar: BAM!ID: 628; Joined: 2006-05-22; Posts: 14; Credits: 6,482,894; World-rank: 73,091

2006-10-10 17:22:37

Lee:

It can get even worse... Right before the longer nov* units were released, one of my client machines had a long run of quick and easy tuesday results.... This caused BOINC to readjust the time-to-process estimation, and it thought it would be able to process a nov result in less than 2 minutes! It proceeded to download over 600 results to process!

I wouldn't have thought this would be a problem, but when BOINC went to reschedule the CPU, it pegged the CPU for about a minute and a half trying to sort through all those results, and since it was basically locked up at that point, child processes weren't getting heartbeats... So an RCN result would start, couldn't get heartbeat, and exited, and the exit of course forces another reschedule, another unit starts running, but has the same problem, and so on. Luckily I caught it in time and was able to shut it down and manually move a large chunk of the units out of the queue, so I could put them back in later and not lose any work.

Then the other feature I didn't know about - when BOINC connected to the report the completed results, it redownloaded all the "lost result"'s from the server again! Nice, but caused the problem all over again. So for the last week I've been nursing the stupid thing along, finally now at a point where all the results aren't overloading the client, and the scheduler goes down. Just figures, doesn't it?

A side note about the CPU problems, I know it's not an optimal solution, but on that client I've just suspended network access, all he's got is a couple hundred more RCN's to run and a CPDN, which has a couple thousand more hours left, so I can suspend there without worry, and the client won't burn cycles managing the upload queue, it's helped quite a bit. You may be able to get the same result by just suspending RCN, if it doesn't have any more work queued (just the uploads), but I haven't tried that. And of course, that's a pain if there's several clients to deal with. But just a thought!
-D

PovAddict: BAM!ID: 115; Joined: 2006-05-10; Posts: 1013; Credits: 5,785,239; World-rank: 78,454

2006-10-10 18:47:57

LoneStar: you could edit the <duration_correction_factor> for RCN on client_state.xml manually. Edit this file at your own risk.

LoneStar: BAM!ID: 628; Joined: 2006-05-22; Posts: 14; Credits: 6,482,894; World-rank: 73,091

2006-10-10 20:59:33

PovAddict: Yeah, I thought about doing that initially, but it's worked itself back out to normal as it's run units with longer times; I wonder if there's any setting to limit the correction factor, so that as the client gradually adjusts it, it won't go past that limit and cause a similar situation. This is the first time it's happened to me, though, so once every couple of years isn't that bad

I also lowered the client's connect time from once a day to about once an hour, even if it happens again that should keep it from downloading sooooo many units because it'll get overcommitted faster.

I guess this could, in theory, be a problem with any project that has variable and hard-to-predict work unit lengths; if the server always says "I estimate this'll take 2 hours" and it only takes 2 minutes, BOINC will skew the correction factor in a horrendous way. But, it needs to have several shorter than predicted units in a short period of time for it to happen that badly, so maybe it's not a big concern, and just bad luck when it happens.

BTW, any new animations coming our way soon through RenderFarm?

WimTea: BOINCstats SOFA member; BAM!ID: 360; Joined: 2006-05-14; Posts: 70; Credits: 48,683,742; World-rank: 18,593

2006-10-12 11:07:21
last modified: 2006-10-12 11:07:46

Hi LoneStar,

Just before it went down, I responded to a thread on RCN about just this issue. Bernhard states that any client d/l of more than 60 WUs in 15 min. would get it's quota lowered to 25. The idea was to prevent comps getting way too much work to finish in time, something not unreal with a daily quota of 2000 WUs.

I proposed a scheme in which this D/L loads of WUs could be the result from what you have experienced: BOINC drastically lowering its estimate of the crunchtime combined with a relatively large cache (= long connect-interval) and thus BOINC D/L lots of WUs.

Looks like RCN's measurements didn't work for you, or BOINC didn't D/L 60 WUs in 15 min... Maybe you can take this up at/with RCN as soon as it's up again...

Regards,

Wimtea

Cori: BAM!ID: 2; Joined: 2006-01-09; Posts: 1351; Credits: 69,052,230; World-rank: 14,743

2006-10-16 21:49:52

*Sigh*

Now, who broke RCN this time?

Greetings from Cori

Pages: [1]

Index :: The Projects :: RCN