Pages: 1 [2] 3 4
Tom Wilson
BAM!ID: 823
Joined: 2006-05-27
Posts: 3
Credits: 4,289,864
World-rank: 94,554

2006-08-24 02:20:19

Willy, I came across this thread. It has a couple of BOINCstats.

http://www.primegrid.com/orig/forum_thread.php?id=283


...thought you might like to know.
--Tom
Lee Carre
 
BAM!ID: 41
Joined: 2006-04-19
Posts: 262
Credits: 299,581
World-rank: 398,557

2006-10-22 20:54:17
last modified: 2006-10-22 20:55:12

Willy, I came across this thread. It has a couple of BOINCstats.

http://www.primegrid.com/orig/forum_thread.php?id=283

...thought you might like to know.

Hi Tom, Willy's a busy guy, so i hope he won't mind me answering for him (and clearing up any confusion)

if you mean the signature images, that's no problem at all, they're not the issue, and it's perfectly acceptable to have them in your sig

the problem is with full HTML pages on the boincstats site (details in previous posts)
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins
Honza
BAM!ID: 109
Joined: 2006-05-10
Posts: 154
Credits: 8,917,709,480
World-rank: 433

2006-10-25 08:24:50

A friend of mine is getting "Scraping not allowed".
As he can't post here, I'll re-post his message. It contains an explanation of how he is using selected data from BoincStats, which I personally do not count as "Scraping".

________
Hello.
Thanks for your work on www.boincstats.com. I use them every day.

Semioccasionally, I look at the world position of my team and country.
The detection of the world position of all projects takes a long time. I
tried to create a summary table for all projects. Now the summary page
is located on http://tygr.czu.cz/~vejpuste/boinc/statistiky.phtml
Data from www.boincstats.com is downloaded once a while and summary
table uses local files. Looking at the summary table makes no traffic on
www.boincstats.com. I use flags only from www.boincstats.com. Some
immages and style are local.
All links from summary table refers to www.boincstats.com.

If you would accept my summary table I will use it. I plan to download
data from www.boincstats.com bihourly for example.

Thanks and have a nice day
Libor Vejpustek
UBT - Halifax--lad
 
BAM!ID: 25
Joined: 2006-02-27
Posts: 366
Credits: 49,272
World-rank: 920,737

2006-10-25 12:46:06

Taking data from a webpage that is not meant to be taken is scraping in my book, the XML files are available from projects so use them instead of him taking data from BOINCstats.

BOINCstats is here for us all to enjoy and if everyone started doing the same things then BOINCstats would be renderd useless.

Thats my opinion anyway, as its Willys pet he's entitled to block people from doing things like this
Join us in Chat (see the forum) Click the Sig
Lee Carre
 
BAM!ID: 41
Joined: 2006-04-19
Posts: 262
Credits: 299,581
World-rank: 398,557

2006-10-25 17:10:08
last modified: 2006-10-25 17:12:21

Thanks for your work on www.boincstats.com. I use them every day.

Data from www.boincstats.com is downloaded once a while...

Some immages and style are local.

I plan to download data from www.boincstats.com bihourly for example.

i have to agree with Halifax lad, it's still scraping
bi hourly is quite often, the stats aren't even updated anywhere near that often!

what he uses the data for is irrelevent, it's the fact that he's using an automated program to get full web pages

he should read the FAQ, in which willy says to contact him for extracting data from boincstats

so personally it's his own fault, willy isn't here to fund other peoples stats gathering
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins
[BOINCstats] Willy
 
Forum moderator - Administrator - Developer - Tester - Translator
BAM!ID: 1
Joined: 2006-01-09
Posts: 9442
Credits: 353,172,950
World-rank: 4,874

2006-10-25 17:40:19

I agree with myself. It is scraping and I won't allow it.

The fact that his IP is blocked can only mean that he has scraped a lot of pages already. It's not something I do after 10 pages.

As per the first post in this thread: an IP block is indefinitely.

He was warned.
Please do not PM, IM or email me for support (they will go unread/ignored). Use the forum for support.
Lee Carre
 
BAM!ID: 41
Joined: 2006-04-19
Posts: 262
Credits: 299,581
World-rank: 398,557

2006-10-25 18:47:01

I agree with myself. It is scraping and I won't allow it.

The fact that his IP is blocked can only mean that he has scraped a lot of pages already. It's not something I do after 10 pages.

As per the first post in this thread: an IP block is indefinitely.

He was warned.

out of interest, what do you do regarding dynamic IPs?
such as the case with most home users via an ISP
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins
[BOINCstats] Willy
 
Forum moderator - Administrator - Developer - Tester - Translator
BAM!ID: 1
Joined: 2006-01-09
Posts: 9442
Credits: 353,172,950
World-rank: 4,874

2006-10-25 19:04:12


out of interest, what do you do regarding dynamic IPs?
such as the case with most home users via an ISP


Not much. But I have other methods in place to block scrapers.

BTW: In the Netherlands most users on cable or ADSL have a "dynamic" IP address that never changes, well, maybe once in a few years.

Many ADSL users have a fixed IP address.
Please do not PM, IM or email me for support (they will go unread/ignored). Use the forum for support.
Lee Carre
 
BAM!ID: 41
Joined: 2006-04-19
Posts: 262
Credits: 299,581
World-rank: 398,557

2006-10-25 20:20:52


out of interest, what do you do regarding dynamic IPs?
such as the case with most home users via an ISP


Not much. But I have other methods in place to block scrapers.

BTW: In the Netherlands most users on cable or ADSL have a "dynamic" IP address that never changes, well, maybe once in a few years.

well yes, a dynamic address doesn't have to change, but the point was that it can
as in, what's to stop a user changing their address somehow, and then just rescraping?
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins
[BOINCstats] Willy
 
Forum moderator - Administrator - Developer - Tester - Translator
BAM!ID: 1
Joined: 2006-01-09
Posts: 9442
Credits: 353,172,950
World-rank: 4,874

2006-10-25 21:07:17

what's to stop a user changing their address somehow, and then just rescraping?


That is for me to know, and for the scrapers to guess.
Please do not PM, IM or email me for support (they will go unread/ignored). Use the forum for support.
Honza
BAM!ID: 109
Joined: 2006-05-10
Posts: 154
Credits: 8,917,709,480
World-rank: 433

2006-10-26 09:50:08

Thanks for the response and opinions - I'll let Libor know.
[BOINCstats] Willy
 
Forum moderator - Administrator - Developer - Tester - Translator
BAM!ID: 1
Joined: 2006-01-09
Posts: 9442
Credits: 353,172,950
World-rank: 4,874

2006-10-27 19:21:21

Unfortunately I found another scraper today. From IP 84.48.95.164 over 17000 pages were requested and over 450MB in bandwidth used since the 10th of this month. 99.9% of the pages requested were from “Team Norway”.

You guessed it, I did it: IP blocked.
Please do not PM, IM or email me for support (they will go unread/ignored). Use the forum for support.
Lee Carre
 
BAM!ID: 41
Joined: 2006-04-19
Posts: 262
Credits: 299,581
World-rank: 398,557

2006-10-28 12:02:12
last modified: 2006-10-28 12:07:21

Unfortunately I found another scraper today. From IP 84.48.95.164 over 17000 pages were requested and over 450MB in bandwidth used since the 10th of this month. 99.9% of the pages requested were from “Team Norway”.

You guessed it, I did it: IP blocked.

"another one down, another one down, another one bites the dust"

is there a way of detecting scrapers that don't hammer the servers and request lots of pages in one go?
i'm guessing that's the only way at the moment, becaues i'm assuming most of them are smart enough to change the user agent to appear like a regular generic browser
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins
[BOINCstats] Willy
 
Forum moderator - Administrator - Developer - Tester - Translator
BAM!ID: 1
Joined: 2006-01-09
Posts: 9442
Credits: 353,172,950
World-rank: 4,874

2006-11-04 22:16:10

One more: IP 64.72.119.194, 7263 pages, 109.93 MB bandwidth on the first 3 days of this month. Last month 20094 pages and 338.99 MB. Almost all pages from one user (MARULA BAR CALDAS DA RAINHA) and one team (Portugal@Home).

IP is blocked.

Don't think you won't be caught!
Please do not PM, IM or email me for support (they will go unread/ignored). Use the forum for support.
[BOINCstats] Willy
 
Forum moderator - Administrator - Developer - Tester - Translator
BAM!ID: 1
Joined: 2006-01-09
Posts: 9442
Credits: 353,172,950
World-rank: 4,874

2006-11-08 23:22:22

IP 211.3.149.204: 7818 pages, 331.54 MB. In two days time. Randomly loading pages every second, up to five pages per second.
Please do not PM, IM or email me for support (they will go unread/ignored). Use the forum for support.
Lee Carre
 
BAM!ID: 41
Joined: 2006-04-19
Posts: 262
Credits: 299,581
World-rank: 398,557

2006-11-09 09:27:03

for comparison, what does an average user generate?
that is, a regular user browsing pages in the typical manor
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins
[BOINCstats] Willy
 
Forum moderator - Administrator - Developer - Tester - Translator
BAM!ID: 1
Joined: 2006-01-09
Posts: 9442
Credits: 353,172,950
World-rank: 4,874

2006-11-09 10:01:19
last modified: 2006-11-09 10:01:49

An average is hard to give, it depends on the persons interests. Some only view their personal stats, others the teams pages as well, and some also check out the country, the project totals etc.

A 'heavy' user views around 3000 pages (this includes the forum) and uses around 100MB a month.

I do not have a problem if someone goes up to 400MB or 10000 pages a month, he/she might have an interest in a lot of stats or is a forum addict. But it is easy to see whether it's a person viewing those pages or a automated process.

Take for example this heavy user (his name is Willy): 12248 pages and 98MB in October .
Please do not PM, IM or email me for support (they will go unread/ignored). Use the forum for support.
Lee Carre
 
BAM!ID: 41
Joined: 2006-04-19
Posts: 262
Credits: 299,581
World-rank: 398,557

2006-11-09 14:32:56

An average is hard to give, it depends on the persons interests. Some only view their personal stats, others the teams pages as well, and some also check out the country, the project totals etc.

A 'heavy' user views around 3000 pages (this includes the forum) and uses around 100MB a month.

I do not have a problem if someone goes up to 400MB or 10000 pages a month, he/she might have an interest in a lot of stats or is a forum addict. But it is easy to see whether it's a person viewing those pages or a automated process.

Take for example this heavy user (his name is Willy): 12248 pages and 98MB in October .
thanks, i know it's hard to give a true average, but i was just after an idea of what kind of traffic a person generates compared to a bot

do you use an automated monitoring system to detect real/bot requests?

and out of interest, what's the server's monthly data-transfer allowence (bandwidth/month)?
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins
[BOINCstats] Willy
 
Forum moderator - Administrator - Developer - Tester - Translator
BAM!ID: 1
Joined: 2006-01-09
Posts: 9442
Credits: 353,172,950
World-rank: 4,874

2006-11-09 15:18:16

do you use an automated monitoring system to detect real/bot requests?

Yes

and out of interest, what's the server's monthly data-transfer allowence (bandwidth/month)?

250GB
Please do not PM, IM or email me for support (they will go unread/ignored). Use the forum for support.
billvelek
BAM!ID: 13491
Joined: 2006-11-29
Posts: 4
Credits: 90,427
World-rank: 714,300

2006-11-29 14:42:06
last modified: 2006-11-29 14:49:09

Willy, I just joined this website, and this is my first post. I have never heard of a 'scraper', but I can understand and appreciate your point of view. However, upon reading the first post in this thread, I feel compelled to suggest to you that you are going too far if you are permanently wiping out team stats. Or maybe I just don't understand what you are talking about. For instance, I formed the HomeBrewers team on the World Community Grid last year and was the team captain until I resigned a few months ago, although I am still a member of that team. Our team, as is the case with all others on the WCG, is open for anyone to join at any time without our knowledge or approval, and I can tell you, frankly, that I don't know anything about the other members of our team or what they are doing. Our team engages in friendly competition in stats, and managed to make it almost to the Top-100 at WCG before our 1st birthday, and we are now ranked 100 there, and soon to be 99 and rising. I know that many on our team use BOINC, although I just started myself a week or so ago, so this is all new stuff to me. But if someone on our team -- and it could be ANYONE there -- were to 'scrape' your site, and do it without the knowledge or approval of our team captain or any of our team members, it is patently unfair to the rest of us to have our team stats permanently deleted as a punishment for something we would know nothing about, and probably couldn't control even if we did. For instance, as a former team captain, I know of no way to eject a member from a team; I suppose it MIGHT be possible to appeal to the administrators at WCG, but how long would that take, if it would ever, in fact, be done. And how are teams to learn that a member is scraping in the first place? Is there some way to detect this?
Anyway, I know how disappointed I would be if our team standing were to suddenly drop from 100 to 6,857 for some unknown reason, and it just isn't fair that it could happen under those circumstances. I sincerely hope you will reconsider and at least give the team some advance notice so that something can be attempted to be done by the team -- such as approaching the administrators to try to eject a team member.

Thanks for listening.
Bill Velek - My non-profit project 2plus2is4.com lobbies for laws to require schools to teach our children about grid computing, about how safe and secure it is, and about the many interesting ways it can be used to help humanity; spread the word!
[BOINCstats] Willy
 
Forum moderator - Administrator - Developer - Tester - Translator
BAM!ID: 1
Joined: 2006-01-09
Posts: 9442
Credits: 353,172,950
World-rank: 4,874

2006-11-29 21:48:33

Clearing the stats history is not equal to erasing credits. That is something I can't do because credits are determined by the projects, and not by BOINCstats.

I also clear the team stats when the site is scraped to produce stats for the team. It is up to the team founder to keep his team members from scraping BS. Most (if not all) scrapers so far had the scraped stats hosted at the team's website, or they were used for things like team signatures.
It is the responsibility of the team founder to tell his members about the 'punishement' for scraping.

In most cases I'm able to track down the user and or team, in which case I issue a warning first, before taking action.
Please do not PM, IM or email me for support (they will go unread/ignored). Use the forum for support.
billvelek
BAM!ID: 13491
Joined: 2006-11-29
Posts: 4
Credits: 90,427
World-rank: 714,300

2006-11-30 20:53:28
last modified: 2006-11-30 20:54:34

Thanks for the reply.

Well, if credits aren't affected, then I guess I shouldn't be concerned. However, just for clarification, you said that it is the team founder's (me regarding "HomeBrewers" on WCG, although I'm no longer 'Captain' responsibility to keep his team members from scraping, and to warn them of the consequences. I'll immediately post a message on our Team Thread at WCG to inform whichever members happen to read that, including our team captain, but that is the only means of communication that I have with the members. We do have a team website, but I don't think our captain -- Fred B -- has any stats displayed, but my post in WCG should bring this issue to his attention.

Anyway, I can appreciate your motives.

While I'm here, since you appear to be the owner of BOINC STATS, would you be interested in providing a link to my website which is lobbying for laws to require our schools to teach our children about grid computing, and how it can help the world? Take a look at http://www.2plus2is4.com and let me know what you think.

Cheers.
Bill Velek - My non-profit project 2plus2is4.com lobbies for laws to require schools to teach our children about grid computing, about how safe and secure it is, and about the many interesting ways it can be used to help humanity; spread the word!
ThomasT
BAM!ID: 4199
Joined: 2006-08-18
Posts: 1
Credits: 2,828,899
World-rank: 121,139

2007-02-22 16:23:26

Hello,

i wanted also such a scraper to build and i hope which I it still in time noticed that am forbidden. If i correctly consider am it also logically! Apology.

Is it to be gotten somehow possible 5 numbers?

Apology for babel the Fish English.

Yours sincerely Thomas
maudeve
Translator
BAM!ID: 2595
Joined: 2006-06-26
Posts: 17
Credits: 141,498,102
World-rank: 9,098

2007-03-05 10:58:28

Willy,
maybe a way to solve this kind of needs still avoiding scraping is to make some specific "services" for users and teams who like to "use" BoincStats numbers.
An example of what I am thinking are services provided by XE (www.xe.com) that let you have a specific clean enquiry page ( http://www.xe.com/pca/input.cgi ) that you can link as a popup tool or insert in your webpage ("clean" means that this tool don't refer to all the services available on the website but just the enquiry to the rates, consider that this page was provided with advertising up to some time ago).
XE also provide information on how integrate other specific services directly in your webpage ( http://www.xe.com/ucc/customize.php )
I'm not a web developer but it seems something not to complex to develop, that can give statistics to user/teams web pages and generate advertising (...so hard needed money) for BoincStats

Lee Carre
 
BAM!ID: 41
Joined: 2006-04-19
Posts: 262
Credits: 299,581
World-rank: 398,557

2007-03-05 21:41:40

Willy,
maybe a way to solve this kind of needs still avoiding scraping is to make some specific "services" for users and teams who like to "use" BoincStats numbers...

I think Willy already offers data from BOINCstats in XML format
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins
Pages: 1 [2] 3 4

Index :: Announcements :: Scrapers ye be warned!
Reason: