Eventually /dev/null

Random thoughts from an emerging Programmer

Be Careful With Approving Comments

Ever got Comments you don’t understand, because they’re in some language you’re not able to speak? Ever asked why someone writes a Comment in Russian to some post, which is let’s say written in english, german, french or whatever( at the same time I want to notice that I don’t talk about Comments made to some .de, .fr, .ru Blog, in the depending language where the owner of the Blog is able to speak that language as well. Where a commenter could believe it would be better to write in their native language as possible missunderstandings could be avoided because of better knowledge of this language).
What could be the reason to behave like that? In general they should be able to write some basic text in english, german, french, as it seems that they’re able to read the text, quite good enough to be able to add their own comment/opinion. So why don’t they do so?
Let’s try to get behind the reason. Following I’ll show you how I handle comments in languages I don’t understand, with an example I received the last day:

The Comment

So let’s take a look at the given Comment(to avoid contribution to this Spammer/Hacker, I replaced some Data):
1000 Ñ„�?¾Ñ€Ñƒ�?¼�?¾�?² 2 �?´�?¾�?»�?»�?°Ñ€�?° 5000 Ñ„�?¾Ñ€Ñƒ�?¼�?¾�?² 8 �?´�?¾�?»�?»�?°Ñ€�?¾�?² 10000 Ñ„�?¾Ñ€Ñƒ�?¼�?¾�?² 13 �?´�?¾�?»�?»�?°Ñ€�?¾�?² 50000 Ñ„�?¾Ñ€Ñƒ�?¼�?¾�?² 50 �?´�?¾�?»�?»�?°Ñ€�?¾�?²
�?‘�?¾�?½ÑƒÑ? �?¿Ñ€�?µ�?´�?»�?¾�?¶�?µ�?½�?¸�?µ �?´�?»Ñ? Ñ‚�?µÑ… �?ºÑ‚�?¾ �?·�?°�?º�?°�?¶�?µÑ‚ 20000 Ñ„�?¾Ñ€Ñƒ�?¼�?¾�?² ч�?µÑ€�?µ�?· �?½�?µ�?´�?µ�?»ÑŽ �?¿�?¾�?²Ñ‚�?¾Ñ€�?½�?°Ñ? �?¾Ñ‚�?¿Ñ€�?°�?²�?º�?°
�?� �?µÑ„�?¿Ñ€�?µ�?´�?»�?¾�?¶�?µ�?½�?¸�?µ: ч�?µ�?»�?¾�?²�?µ�?º �?º�?¾Ñ‚�?¾Ñ€Ñ‹�?¹ �?¿Ñ€�?¸�?²�?µ�?´�?µÑ‚ �?¼�?½�?µ �?º�?»�?¸�?µ�?½Ñ‚�?° �?±Ñƒ�?´�?µÑ‚ �?¿�?¾�?»ÑƒÑ‡�?°Ñ‚ÑŒ 10% �?¾Ñ‚ �?·�?°�?º�?°�?·�?° �?º�?»�?¸�?µ�?½�?º�?°!!!
�?ž�?±Ñ€�?°Ñ‰�?°Ñ‚ÑŒÑ?Ñ? �?² �?°Ñ?ÑŽ 3�?¿Ñ?Ñ‚ÑŒ3-8ш�?µÑ?Ñ‚ÑŒ7-0�?½�?¾�?»ÑŒ1 �?¼Ñ‹�?»�?¾ mymail(�?³�?°�?²)example.com

That one made me curious as there are quite some numbers added, as well as an additional email is added which doesn’t fit to the input in the Email field. Let’s check that Comment by translating it in our native language or some language we understand.

Translating the Comment

In general I can recommend to translate it into your native language, as you mostly understand that one best. On the other hand the chosen translator, has maybe your language not available(or the dictionary is quite limited), or it’s not possible to translate between these languages directly.
You should avoid to have the text translated more than once before it’s in some language you understand, as the general problem with automatic translations are: that not the best sentence structure and word choosing is done. So you could end upon two or three translations steps within some nonsense text(that wouldn’t be better). The best way maybe, to let it translate to English, and if you don’t understand some english words let them be translated to your native language.
for the example above we would get something like that:
1000 forums 2 dollars of 5000 forums of 8 dollars of 10000 forums of 13 dollars of 50000 forums of 50 dollars the Bonus the offer for those who will order 20000 forums in a week repeated sending �?� �?µÑ„�?¿Ñ€�?µ�?´�?»�?¾�?¶�?µ�?½�?¸�?µ: the person which will result to me the client will receive 10 % from the order �?º�?»�?¸�?µ�?½�?º�?°!!! To address in �?°Ñ?ÑŽ 3»nÃ��?ý3-8ÞÑßÃ��?ý7-0¡«½ý1 soap mymail (�?³�?°�?²) example.com
That makes now quite more sense, doesn’t it? It seems as that’s the pricelist for Spamming of Forums, we even see that we get 10% of the profit from something!

Translate unknown words

Now we know quite surely that this is a Spam comment, but as you can see as well, we have some not translated words, like �?º�?»�?¸�?µ�?½�?º�?° (these can be sometimes important) so let’s have them translated as well, don’t we want to know how to receive our 10%?
If you’re using some good Translator, you should have the option to have unknown words transliterated into the target language. So for our �?º�?»�?¸�?µ�?½�?º�?° we would get something like: klienka that sounds like client. Let’s guess that we receive 10% of the Money the client pay for his contract.

Deciding dropping or keeping?

Now you should have enough information to decide if it’s a Spam comment or some legit one. If it’s Spammy it shouldn’t be hard to decide, if it’s some legit Comment I advice to keep the initial comment and add below it the translation. If you like you can as well improve the comment, but note explicit where you made changes!

Some good online Translators

Where can I get my text translated to some other language?
Just search for some Translate/Translation From-Language to-language and you should find some useful result. A good translator is PROMT, there you’re able to translate some texts as whole(no word for word translations) of some languages, or Babelfish. If you need to get some words translated into your language search for some dictionary for the given languages.

Conclusion

As you see, it’s better to prove comments of other languages as well(these will often pass Spamfilter) for Spam. If you can’t get the comment translated, it’s mostly better to keep the comment back or to drop it. From my point of view it’s better to have one or two legit comments less than to have one Spammy.

How to Prevent Most of the SPAM

As Spam is a real big Problem within the Internet, today nearly no one gets around it, as Internet consumer(someone who doesn’t offer own communication platforms) you may not notice that Problem too much, maybe you didn’t even noticed it really. But it’s definitely a real Problem, Spam is everywhere.
You get it with your Daily Emails, already when you input your Email once to some untrustworthy Mailing list or application, and from that date it never will stop again. If you’re lucky that’s everything where you get into contact with Spam, as soon as you own some Blog/Forum aso. your Spam contact will be much bigger. How can I now prevent these platforms from Spam?

Registered Users only

One Option which is widely used is to allow Postings only to registered Users. This keeps out every Spambot who doesn’t have a routine to register to this platform(or to register in general). Additional many registering processes require that the email is validated, if that doesn’t happen the account will not be able to use this Account. Again that will prevent many Spambots from Posting Spam, as many Bots do have some registering routine, but they don’t use valid Email Accounts and/or don’t do the needed steps to activate the platform account.

What’s the reason that they don’t do this?

Mostly it’s the reason that there are enough platforms outside who still doesn’t use such a protection mechanism. Sadly that method may take away some Users, who would like to post some Comment/Post, but they’re not willed to create therefore some Useraccount just for one or two single contributions. The reason why they’re not willed to do so is,(you will know it mostly as you’ll surely think similar) is that they’re afraid of getting Spammed on their Email. And that only because he registered at some small website who delivers your Email to some Spam mailer.
Fortunately there’s a way to receive Contributions from non-Members, and that without a big level of Spam you’ll have to fight with.

Spamblock and -labeling

One reason wasn’t mentioned above explicit, but you could read it between the lines. Maybe you, the platform owner, want although to catch the users who don’t want to register at your site just for a single post. But this position don’t needs to be negative for you. There are many solutions out who are really nice and widely used, so proved to work well.
In general I see three different kinds of applications, who mostly differ between what needs to be done in future by yours:

Install and care yourself about

These kinds of Software is just installed, then the User needs to add what words will block the user contribution, that kind of blocking was more commonly in the early days of spam, where no services where available to Check Messages for Spam. You have to take care of the filter list, as the spam messages get changed regularly. At some point you’re within a dead end, the spam messages doesn’t contain any words you can block without blocking potentially legit posts. This kind of App, isn’t doing too well any more as Spam gets changed regularly, and sometimes don’t even appear a human to be Spam on the first moment. The mentioned problem is more likely to happen on Websites which handle common Spam themes like Real Estate, Pharmacy and so on.
An example Can be get here for phpBB

Human or Bot Tests

Some quite well working approach is to decide upon a Message, is Spam or not, is to ask for something which is only doable by a human. The best example is the CAPTCHA, the user is asked to type in some Letters and Numbers from a picture in order to be able to post the Message(or tagging it as no-Spam). But you need to be careful, there are already some Bots out who can read early CAPTCHAs or weak ones. On the other hand there are newer CAPTCHAs out which are quite hard to solve even for no handy caped people(just imagine how hard it would be for handy caped ones). Some good CAPTCHA method was developed by Microsoft you’re given 9 Photos of Dogs and Cats and you have to select only the Cats/or Dog ones. The project is called Asirra (Animal Species Image Recognition for Restricting Access) and is powered with photos by Petfinder.com
Then you have some Checkboxes you need to click if you’re a human(this is nowadays no problem for Bots any more, as they fill in every field). Another idea is to ask to calculate two values together through some addition, subtraction. This one is quite hard to solve for them, a newer version of it is to ask some questions which needs to be answered.

Spamlabeling as Service

Nowadays you’re served with Webservices who check your Email/Comments. These are used by many users and do catch the latest Spam quite fast, every User is able to improve the Ruleset as he can report false positives or not caught Spam. The best Example would be the well known Akismet.
These services are mostly the future of Spamblocking as your work is quite low at all, and you have a real good rate of false positives to catched Spam.

Conclusion

Spam is a really heavy Problem, since some years and will mostly stay another few years as long as no worldwide antispam act is done, so the Spammer can be suit everywhere one the world. But you have since the start some really good Protect mechanism, and fighting Spam was never easier before with Services like Akismet. Don’t belong to the People who loose time and money through Spam. Fight it, you’ll not regret it!

Why Stealing Is Bad

I never thought about such an step by myself, maybe as I never encountered such a thing upto date.
But if you run your own Webspace you should never steal any ones content or bandwidth. Then as soon as he notice it you can get some real problems, maybe he starts a lawsuite against yours(you maybe ignored some copyright laws, you caused some additional costs for him(bandwidth)). What does happen if he simply replaces the content/redirects your website to something which does harm your visitors, or does blame you?
On the Following WordPress Topic you’ll can read that someone linked to some JS of Website_A. This JS is the Output of some public free available WP Plugin, the JS code even mentions that it’s generated by some Plugin. But somehow the owner of Website_B was too lazy, or wanted to save some bandwidth that he simply linked to this JS file, on Website_A.
After the owner of Website_A recognized that someone was stealing his Bandwidth he created some mod_Rewrite Rule which redirected the Request from this JS to some other JS file, which contained an alertbox which appeared in front of the Visitor and told him that this Website steals some Traffic from another one. After one month the owner of Website_B discovered that JS change and removed the JS.
But it’s important to say that theoretically the owner of Website_A could have written any JS code into that file. So he could steal some Cookies of the Users of Website_B or anything else he would like, he could even start some Phishing attack.
The owner of Website_B made his website vulnerable because he was to lazy to get the script itself.
Every good Webmaster/Site owner does not steal any content, as this is unethical and maybe more important dangerous!

How to Get PHP5 Working on Debian 3.1

It took me ages to get my server updated to PHP5. I’ve tried multiple ways to get it working as well compiling the sources itself…(Yeah I’m quite new to Linux and it’s install behaves and blame me if you like, but I like the easy way not the hard one)
But I got none working except I found today that post, it’s quite short but it is totally enough. I’ve got everything I need to run my PHP now under the Version 5.
Ok it wasn’t exactly as in the Post stated, I had to install as well php5-mysql, to get the database running again and I didn’t needed to create softlinks to php5.conf & php5.load, but it still was quite easy no need to compile the source code yourself…
With that post I’m adding as well some New Category Webserver where I take notes for everyone Public, but mostly they’re for my personal usage for the case that I should need them again.

http:BL - Remedy or Just Painkiller?

As you surely noticed already a few weeks ago, Project Honey Pot announced that they’re now also tracking Comment Spammers. And they’re doing it quite successfully, the current rate of newly catched spammers is around 100 per day. Part of the success is for sure the the http:BL Plugin for WordPress, which can be grabbed here. Which made http:BL more known, and in order to use this Plugin you need a Project Honey Pot Account and if you have an account it’s not much work to get your own Honey Pot.

http:BL the success

Many users from http:BL WP Plugin reported that their daily Spam value dropped dramatically from 100 and above, to just a handful Spam Comments per day, some also report that they don’t receive any Spam at all after setting up this Plugin and Service.
For myself I got some similar figure but not too successful at all, my Spam count is still around 15-20 per day, but the amount dropped around 66%.

Remedy

After just a short time delay, new Spammer IPs will be caught and everyone who uses the http:BL Plugin profits from it, as the service(-Plugin) will block the attempts. So your Website will not receive any Spam and also the traffic, to the site can be lowered, for some heavy traffic pages it will have a big affect and will be a real mercy. One of the main aims is further to reach some Level of Informations about some Spammer/Bot Networks to start lawsuits against them.
That could lower the amount of daily Spam, although it’s a long and hard way to this date. So some day, we can live without Spamming at all. It’s our remedy!

Painkiller

The other facts are: Much Spam is coming from Zombie Computers, and these are just the PCs of poor unsuspecting Internet Users. Which behave just like normal PC Users, who gets everytime he logs into the internet another IP.
http:BL only logs and blocks the exact spammy IP, and after a few days of no Spam from this IP it’s supposed to be clean. So if a Zombie enters the Internet, he mostly gets an innocent IP, the user stays connected for some time and sends 10 Spam posts(without his knowledge), where 3 off them reach the Blog. The IP is now blocked…but not the next time he connects to the Internet, he will be mostly innocent again…and there are millions of Zombies out. Now just imagine that this problem just gets even bigger when IPv6 makes it way into the wide world.
http:BL is just a painkiller which makes it liveable with Spam, at least for Users which use http:BL.

From Painkiller to remedy

You see there are some easy to see positive effects, but these alone will not stop Spam at all. To succeed the competition against spam, we need a world wide Anti Spam Act, which enables it from everywhere in the world to start a lawsuit against Spammer and that they can nowhere hide, any more. As well it’s important to make the Internet Users more awake to keep Anti-Virus and -Malware Applications up to date, to not end as Zombie. As well as the use of http:BL or similar tools need to be spread to a wide range of websites to make it gainless for Spammer to Spam, also if they’re using just Zombies, as the risk to be catched is too high.

Conclusion

It’s possible to get the Web mostly Spam free, but it will take much work, and not from just a handful Website owner, to achieve it. Mostly it will be that we have to use some protections for ever, as the less we try to fight them the more Spam will be. Equal what will happen, to successful fight Spam, now, you need to participate to http:BL or similar projects. Together we can stand Spam.

Be Nice, Give Your Visitors Some Sweet

Aren’t you bugged of all this evil around you and in the world? Not Even the Internet is a nice place.
Everywhere are phisher, Spammer, Harvester, Viruses, Trojans and so on. But that’s no reason not to give some love to your visitors by handling them over some sweet( for sure in digital kind!).
I mean if no one starts to be nice, the web never changes.

Who sweets deserve

So who deserves sweets? These who deserve sweets are a minority within our web community. Although they’re just a small part they have much influence and visit more websites within a week than most of us do within their live!
These good labourers deserve some sweets for their hard work. I mean they visit each and every link. Although links a normal user Never see! Because the normal user just visits your site skin-deep, but this minority finds every link! And that’s not all they do, they fill our email boxes with live and long awaited content, bring comments to our blogs and forums. They really deserve something. Something SPECIAL!

How to give some sweet

How do we give them now sweets? As we know they’re the only ones who find each and every link. Therefore why should a normal user be able to find the way to the sweets too? To block this we put simply some “hidden sweet links” to our website, which just these labourer find.
And behind this link they find the paradise. Some unused Email addresses! And they turn happy and know why they’re working so hard!

They’re getting what they deserve!

After they found the sweets, they’re tracked and as soon as they use this unused email you can block the IPs which use it. That works not only for Harvesters and Spam Mails. No, you can now also fight with this way Spammer. Project Honey Pot which just signed a 1$ Billion Lawsuit has also ways to find and block Comment Spammer.

How it works

Setting up your own Honey Pot

Setting up your own honey Pot to trap the evil is as easy as uploading an additional page or editing the template of your blog.
At first you need an account.
After that you can install your own honey pot. Just define in what language the file shall be written, as well as some other informations. Download the file, Upload the file. View the file, Follow the instructions and visit the link. And you’re done. Now you need only to add some links to your websites so that Crawlers can find them.
If you don’t own webspace you need only to be able to edit the template files and add Quicklinks to your theme.
Quicklinks are links to other Honey Pots. After these few steps you run your own Honey pot, which catches Harvester, Spam Server, and Comment Spammer. At Project Honey Pot you can track your trap.

How to profit from the Data

Ok with the current Information you don’t profit in anyway, except that the Spammer, maybe some day will be sued. Another Press Release from Project Honey Pot was http:BL. With this Blacklist API you can send the Visitor IP to Project Honey Pot. There it’s compared with the collected data. After that you get a result back, where you can find out if and when, which Type the malicious server is. Also you can find out, when the IP was last time active(maybe they Spammer moved to another IP and the visitor is legit?). And how much he scores within Threat and Damage.
Depending on that Information you can decide either to block the access attempt or allow it.

Which software does already make use of it?

There’s a good version for WordPress
And for phpBB is a Knowledge article available

Conclusion

Try to install a honey Pot at your Website or at least try to link to one. The more we are the more we have impact on it. Maybe we can get the Web a cleaner place to stay! Install http:BL Plugins/Mods to protect your website from these evil.

Having Fun With Google

Are you bored and don’t know what to do? Ever tried to have fun while working with Google?
No?
You have to try it, it can be real fun!

Google Code Search

This service was made public by Google in October last year. The sense behind this feature is to make it easier for programmer to search for Code examples. The Source Code which you get as result was crawled by Google at websites all over the web. The Source Code you can view is open source, so no copyright problems are caused.

How you have fun

As you surely as developer of programs know it’s sometimes the hell to get your code working. And as it seems many programmer want to remember this also years after creation, when they have to work with the code again. If you search for damn, fuck, and other 4 letter words and other expressions you find many funny statements. You can see some of them at this website: webthreads.de.
Don’t think you find no comments about persons who haven’t to do anything with programming. So let’s see what you can get by crawling for George W Bush:

85: AddReply([’ george ‘, ’ bush ‘], [

     'Bush, this man is a war criminal, and we will see that he is brought to trial.';  
      , 'The leader of the international criminal gang of bastards.';  

790: ;; MORAL-II: Redefining somebody else’s advice is BAAAAD (to speak with

   ;; George Walker Bush), and why would you redefine your own advice anyway?  
   ;; Advice is a mechanism to facilitate function redefinition, not advice

Let’s talk with Google

Googles search Engine is already able to talk with you! Visit douweosinga and you’ll see that this is possible.

How it works

It’s totally simple at the website above you simply write 3 or 4 words into a Textbox. The script searches then at Google and returns you for the first search Result the first Word after the term given by yours. Now it removes the first word of your word group and searches after it again, with the first returned word added. Also this time is the first word after the group taken and added to your sentence, as well as it is now part of the search term. And so on.
So you get often funny sentences, which end everytime, just a question of time, after a while as mess.
As the search results for the terms changes just as the web changes your results can be tomorrow totally different than they were today.
So for example I got from Google (Bold are my words):
* George Bush drinks the Blood of the Jews and Poles.
* bill gates plan to Win the War on terror, as a tool for learning. and Teaching. in the Social Sciences. – That’s the answer to the question why the hell MS products are so expensive. we help to fight terrorism!

Conclusion

As you see you can get alot of fun with Googles services. So why don’t you try it for yourself? It’s worth a look!
Here are the links again:
Google Code Search
Google Talk

HTML - This Is the Reality

Google crawled in 2005 around a billion pages to find out what Tags and Tag attributes are used within them.

General

So this Study shows for example that within the <html> the attribute xml:lang is quite often used, but it’s totally senseless within a HTML Document as no HTML processor does look at it.
It also shows that many people aren’t able to write values within quotes, so mostly wrong attributes and values for them would be the answer.
As you also can find out is that the meta name revisit-after is useless as just one Spider ever looked at this attribute.

Do we need this for SEO?!

What I personally found really interesting is this Sentence about meta attributes:

Next we have two name values: keywords, which these days is mostly useless, ironically, and description, which is still somewhat useful.
Does that mean that parts of SEO, like the meta SEO is totally senseless, as these tags aren’t taken serious any more?

The Look is the Content

As we already know Tables are misused by many websites and templates for designing purpose, therefore most attributes uses within them are just for presentation. But that applies also for to the body tag. As you also can see from these results most pages still use the attributes to define the Styling rather than css to do this.

Hyper Text Mark-up Langauge is Difficult

It’s really interesting to see how many misspell attributes just one example is the above already misspelled Language which can be found in a lot of versions on the internet. And what I didn’t know for myself you don’t need to add even the language attribute to the script tag, as there’s just one valid language for it!

I need my own Tag attribute

The last thing I want to mention is the quite big amount of self invented tags, which comes from Adobe GoLive, MicroSoft Office and some more. Where I can’t see any sense behind it as these tags aren’t supported by Browser, or mostly not supported. So what are they for?

Conclusion

To write 100% HTML 4.01 Mark-up is a lot of work and needs much attention. Not only when you have written it also after you published it. Maybe HTML isn’t for everyone, and you may need a HTML Expert, to be able to publish HTML code.
Take a look at the Study for yourself, it’s worth looking.

Tip of the Week!

I’m happy to announce that the Tip: Fake-Transparency for UserControls, is taken as Tip of the Week 24.10-30.10 on the biggest german Website for VisualBasic, Vb@rchiv.net.
I feel me now a bit proud and I’m strengthened that my work, what I do isn’t so worse as I sometimes see it.
I hope that my following tips I will someday release and other coding work will connect on this success.