Ravens PHP Scripts: Forums
 

 

View next topic
View previous topic
Post new topic   Reply to topic    Ravens PHP Scripts And Web Hosting Forum Index -> MySQL
Author Message
fkelly
Former Moderator in Good Standing



Joined: Aug 30, 2005
Posts: 3312
Location: near Albany NY

PostPosted: Sat Mar 13, 2010 10:19 am Reply with quote

Well, on my bicycle club web site I have a custom RN membership module that handles registration for the club (separate from Nuke Users table). In that table I have a household number field in a household table. The data was originally converted from MS Access which in turn was probably converted from index cards (in other words, Legacy stuff).

Some of the household fields have the & (ampersand) character in them. An example would be '8808R&SGIB' as a field value. I started noticing some strange test results ... I wasn't retrieving any records with this kind of code:

Code:
$sql_hh = 'SELECT '.$prefix.'_household.Household_Number FROM '.$prefix.'_household  WHERE '.$prefix.'_household.Household_Number = \''.$hh.'\'';


where $hh was a variable with the 8808R&SGIB in it. Humm. My first thought was that I had to mysql_real_escape string it. Later I realized that in constructing our $sql_hh variable we essentially do a mysql_escape_string on it already by putting in the \''.$hh.'\'' ... so essentially adding mysql_escape string was double escaping it. No good. In other words after constructing $sql_hh I was doing a separate statement to have $sql_hh = mysql_real_escape_string($sql_hh).

I was also thrown off by the fact that if I copied the sql:

Quote:
SELECT nuke_household.Household_Number FROM nuke_household WHERE nuke_household.Household_Number = '8808R&SGIB'


over to PHPmyadmin it would retrieve a record but in my module it wouldn't.

Thinking that I was double escaping the quotes I modified my SQL to just escape the $hh part:

Code:
$sql_hh = 'SELECT '.$prefix.'_household.Household_Number FROM '.$prefix.'_household  WHERE '.$prefix.'_household.Household_Number = \''.mysql_real_escape_string($hh).'\'';


but that didn't work either. Perplexed I turned to google and determined that what I really needed to do was html_entity_decode the $hh. At this point, this statement works:

Code:
$sql_hh = 'SELECT '.$prefix.'_household.Household_Number FROM '.$prefix.'_household  WHERE '.$prefix.'_household.Household_Number = \''.html_entity_decode($hh).'\'';


Now I need to go back through my entire system and see where I may be trying to retrieve that household field value and fix and test 'em all. Ugly.

Thinking further it's really too bad the software (between PHP and MYSQL) can't handle this type of thing automatically. & is a legitimate character and it can be used legitimately in database fields and you shouldn't have to do a dance of death to retrieve it nor should you have to use special functions depending on possible values in your database fields.
 
View user's profile Send private message Visit poster's website
djmaze
Subject Matter Expert



Joined: May 15, 2004
Posts: 727
Location: http://tinyurl.com/5z8dmv

PostPosted: Sat Mar 13, 2010 10:34 am Reply with quote

PHP is a language and HTML is a language.
In PHP an & is well.... an &
In HTML an & is a special language character and you need an entity to show it using htmlspecialchars()

You use html_entity_decode() on $hh so this means your PHP is screwed and is using & and not & in strings.
& is only valid in HTML so get rid of all htmlspecialchars() and html_entity_encode() calls where it shouldn't be used.
Use htmlspecialchars() only when outputting as HTML

Examples:
Code:
echo 'This is a text file & has no need for encoding';

Code:
echo '<h1>This is a html output &amp; needs encoding</h1>';


Your biggest problem might be non-professionals developing the PHP code and screwing up the definition of I/O

_________________
$ mount /dev/spoon /eat/fun auto,overclock 0 1
ERROR: there is no spoon [ Only registered users can see links on this board! Get registered or login! ] 
View user's profile Send private message Visit poster's website
fkelly







PostPosted: Sat Mar 13, 2010 10:53 am Reply with quote

Quote:
Your biggest problem might be non-professionals developing the PHP code and screwing up the definition of I/O


You very well might be right about that. I was reading Thompson and Welling as I wrote the initial code and even 5 years later I have a lot to learn. Too bad we couldn't afford to hire a professional programmer for a local bicycle club but we couldn't. Too bad we had legacy databases too ... in retrospect I probably just wouldn't use the & character inside of fields. But I didn't have time for conversions and that field is used to link tables so it wasn't just a straightforward thing to get rid of the &'s.

All that said, I've been obsessing about the need for filtering standards in RN going forward and here is another complicating case. It appears to me that if you are stuck with certain characters, such as the &, embedded in your data then you need to do some special filtering such as html_entity_decode that goes above and beyond what check_html or mysql_real_escape_string do. And since a programmer has no real control over what's in a database maybe we need to do something like this as a general rule.
 
djmaze







PostPosted: Sat Mar 13, 2010 9:31 pm Reply with quote

You are right fkelly.
There is a general rule but, it should be noted more. However, that would complicate things for newbies as they need to actually "read a manual" before screwing around Laughing

Rule of thumb: never ever trust the client/visitor
what does the server expect to receive as input?
1. text: validate data
2. html: validate data
Always validate the input data through tidy, html_decode, regular expressions and who knows what else.

php => database: no conversion
database => php: no conversion

php => html output:
#1 (text) = htmlspecialchars()
#2 (html) = no conversion

php => txt output:
#1 (text) = no conversion
#2 (html) = strip_tags(html_entity_decode())


So basically you only convert where needed.

Problem is that also phpBB converts data to be with entities. This has to do with a latin1 database (not unicode) so that all unicode characters are stored as entities inside the database.
Else, the PHP script has to convert the unicode on each page request to allow seeing the forum post in Russian for example.

If RavenNuke is UTF-8 based most of these issues get resolved.
The list of do's and dont's just gets to big when character sets get involved, that's why we converted DragonflyCMS to be fully UTF-8
 
Palbin
Site Admin



Joined: Mar 30, 2006
Posts: 2583
Location: Pittsburgh, Pennsylvania

PostPosted: Sun Mar 14, 2010 12:10 am Reply with quote

fkelly, how did $hh become encoded in the first place?

_________________
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." — Brian W. Kernighan. 
View user's profile Send private message
fkelly







PostPosted: Sun Mar 14, 2010 8:55 am Reply with quote

Palbin: trying to reconstruct this here's what I think happened. As I said in the initial post I had legacy data with ampersands in the household field ... eg 8808&RSG or something like that. There is a household table and a members table and there can be several members with the same last name (or different ones) that belong to the same household. In my membership maintenance application I have a search by last name that brings up a list of household that match the last name, the user can then click on the appropriate household to maintain the record associated with the household.

When I first developed this application I didn't know from s*** about W3C validation. At some point I must have validated the screen and got the errors about unencoded ampersands. So I made the values that went on the submit buttons use the htmlentities($hh) where $hh was the household value pulled from the database.

All fine and good except that when you use that value to retrieve from the database to maintain the record you will be using 8808&amp;RSG instead of 8808&RSG for instance. Since only a relative few of the records in the database have a & in it and since it's not like I have a testing staff for this stuff, it got by me.

I haven't tested yet to see if our check_html function automagically takes care of this. It's obvious to me that if you are pulling any text or character data from the database you have to do some kind of filtering to make sure that any html characters are encoded. And then if you use them to retrieve from the database or if you insert them back into the database you have to remove the encoding. Exactly how this should work is to be determined. I have it fixed for my application (I think) but what we need is a more general set of recommendations or standards.
 
Palbin







PostPosted: Sun Mar 14, 2010 9:47 am Reply with quote

My concern is that some may have wanted things stored as entities so you can't do a blanket decode/encode. It is up to the individual module creator to know what is going in and coming out. I just don't see how a standard can be applied to a case like this. What really needs to happen in a case like this is the data would need to be converted either way via a script. The reason i say this I don't see how we can be responsible for third party applications.
 
fkelly







PostPosted: Sun Mar 14, 2010 10:47 am Reply with quote

I see your point, halfway anyway. If you want to store the encoded value (e.g., &amp;) then fine. You need to avoid doing a html_entity_decode() on the value before storing it then. However, if you have a non-encoded value stored in the database you need to encode it before displaying it on a web page, otherwise you'll generate a compliance error. Thinking back on what I did, probably the best way to deal with the situation would be to do something like:

Code:
echo htmlentities($hh) 


where $hh is the non-encoded value retrieved from the database. Rather than something like:

Code:
$hh = htmlentities(row['household');;

echo $hh;


especially if you are using $hh in a form where it can eventually be resubmitted and used in database retrieval or update. However if you are using $hh as the value of a submit button that gets a bit hairy ... if you have a submit button with the name of x and you say value="'.htmtentities($hh).'" ... the value of x that gets submitted will be the encoded value and you'll have to decode it before using it in the database.
 
montego
Site Admin



Joined: Aug 29, 2004
Posts: 9457
Location: Arizona

PostPosted: Sat Mar 20, 2010 10:04 am Reply with quote

djmaze, spot on and thank you!

_________________
Where Do YOU Stand?
HTML Newsletter::ShortLinks::Mailer::Downloads and more... 
View user's profile Send private message Visit poster's website
Display posts from previous:       
Post new topic   Reply to topic    Ravens PHP Scripts And Web Hosting Forum Index -> MySQL

View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Powered by phpBB © 2001-2007 phpBB Group
All times are GMT - 6 Hours
 
Forums ©