Categories
PHP Programming

Search PDFs With PHP, MySQL, and PdfToText

Being able to search a PDF is a very useful feature on any web site.  The problem is that there aren’t many languages that give you the tools to do so right out of the box.  PHP is no exception to this.  If you want to search PDF files you’ll need some third-party tools and a little bit of ingenuity.

Pre-requisites

You’ll server will need to have the following configuration.

  • PHP (>=4)
  • MySQL (>=4)
  • Linux (Distro of your choice)

Step 1:  Download PdfToText

PdfToText is a program written in C that will quickly convert the contents of a PDF to text.  We’re going to use it just for that purpose.  You download the file at http://www.foolabs.com/xpdf/download.html.  Once you have downloaded the file, go ahead and place it somewhere in your web site directory and extract it (on most linux systems “tar -xzf [file]” will do the trick).  Once it’s unzipped, you’ll see a program called “pdftotext”, which is what we’re after.

Step 2:  Convert the PDF to Text

As an astute reader, you’ve probably noticed by now that PdfToText is not a PHP file.  So how are we going to use it?  Well, we’re going to use the “backtick” (the ~ [tilda] key) operator.

function convert_to_text($pdf) {
     $output = `./pdftotext {$pdf} temp.txt`;
     return $output
}

The backtick operator will execute any command on the command line, trap it’s output, and return it to the caller.  It’s worth noting that the backtick operator will only return output from standard out.

This is probably the hardest part of this tutorial.  There may be problems with write permissions on the directory, or ownership problems, but if you can get it to work, you’re all set.

Step 3:  Read the Text

Now that the PDF has been converted to a text file, we need to get that information back in to PHP.  To do that, we use the file_get_contents functions.

function get_text() {
     $text = file_get_contents("temp.txt");
     return $text;
}

Step 4:  Store the Data

This part of the tutorial assumes 2 things.  1) That you have a table named pdf_data, and 2) That the table has a column called pdf_contents that is full-text searchable (If you need help setting this sort of thing up, leave a comment).

function store_data() {
     $text = mysql_real_escape_string(get_text());
     $query = "INSERT INTO pdf_data (pdf_contents) VALUES ('{$text}')";
     mysql_query($query);
}

Step 5:  Search the Data

The final step is actually searching the data.  To do that, we’ll use the full-text searching capability of MySQL.

function search_data($term) {
     $term = mysql_real_escape_string($term);
     $query = "SELECT * FROM pdf_data MATCH(pdf_contents) AGAINST ('$term')";
     $result = mysql_query($query);
     while($row = mysql_fetch_array($result)) {
          //Do stuff with returned data.
     }
}

Where “Do stuff with returned data” is, you can do whatever you want.  MySQL is going to return the rows to you in order of relevance (descending).  The most relevant result will be first, followed by the second most, and third most, and so on.

Other Notes

  • PdfToText may or may not be the best way to do this, but it is one of the simplest.  There are a handful of libraries out there for creating PDFs in PHP, but surprisingly few for something as common as reading a PDF.
  • There are binaries and source files available for PdfToText on their web site(here).
  • This tutorial could be expanded a lot.  If you have questions or requests, please ask!
Categories
PHP Programming Wordpress Development

$_SERVER Variables Are Unsafe For WordPress Plugins

Sometimes a plugin developer might want to submit a form back to itself.  Or perhaps they want to link back to the current page, except with a variable in the query string.  Often enough, you’ll seem them do it this way.

1
<form method=POST action='<=$_SERVER['PHP_SELF']?>'>

or

1
2
3
4
5
6
7
<a href='<?=$_SERVER['REQUEST_URI']?>'>Click Here</a>{/code}
 
The problem with this code is that it's easily exploitable.  Remember, the behavior for REQUEST_URI and PHP_SELF are to take whatever the entrance URL was and return it to the caller.  So how can this effect your pages?  Since the user can append anything that they'd like to the initial entrance URL, it becomes the vector for attack.
 
So how can you submit forms and links back to themselves without these variables?  For forms, just leave the action blank or don't include it at all.
 
<pre lang="html4strict"><form method=POST>
<form method=POST action=''>

And for links, using the # sign will link back to your current page.

<a href='#'>Click here!</a>

If a plugin developer absolutely MUST use server variables, just make sure to escape them accordingly.   Use the WordPress function esc_url().

1
<a href='<?=esc_url($_SERVER['PHP_SELF']?>'>Click Me!</a>

In reality, it’s bad practice to use the PHP $_SERVER variables at all.  So try to avoid doing it at all costs.

Categories
PHP Programming Wordpress Development

WordPress Database Query Using The WPDB Class

As a plugin developer or WordPress hacker, accessing the database used by a WordPress install is vital.  This can be accomplished through a few different means, but the best is by using the WPDB class that is provided.  The only requirement for using this class is that your code exists within the WordPress install (plugins, themes, etc).

WPDB Queries

Let’s say that you would like to run a simple query that returns all of the rows in the “posts” table.  With the WPDB class, all you need to do is execute:

1
$rows = $wpdb->get_results( "SELECT * FROM $wpdb->posts" );

When this code is executed, it returns the entire table “posts” ($wpdb->posts) as an array of objects into the $rows variable.  From there, it’s easy enough to iterate over the array using a foreach loop.

WPDB Insert

Inserting data into a table is easy using the WPDB class.  All you need to know are the column name(s), the table name, and data you want to store.  I’ll lead with an example:

1
$wpdb->insert( 'links', array( 'link_url' => 're-cycledair.wploadtest.xyz', 'visit' => 12 ), array( '%s', '%d' ) );

This example of $wpdb->insert, inserts “re-cycledair.wploadtest.xyz” and “12” into the link_url and visit columns of the “links” table respectively.  The third argument in this function is one that tells the WPDB what type these values are.  The first value is a string, so we use “%s”, and the second is an integer, so we use “%d”.

If you would like to know the auto-incremented id of this insert, simply call:

1
$wpdb->insert_id

WPDB Update

Updating rows in a table is also easy with the WPDB class. Here is an example of an update.

1
$wpdb->update( 'links', array( 'link_url' => 'wordpress.org'), array( 'ID' => 15), array( '%s'), array( '%d' ) )

As you can see, this works a lot like $wpdb->insert. The first argument is the table name. The second argument is an array of column-value pairs. The third argument is the where condition (if ID is equal to 15). The fourth argument tells the WPDB class that you are updating a string, and the fifth argument says the WHERE condition is an integer.

WPDB Prepare: Protect Against SQL Injection

One thing every WordPress developer needs to know about is SQL injection. SQL injection is when someone is able to modify your SQL query to execute their own. To prevent this kind of malicious attack, the WPDB class has a method called “prepare”. “Prepare” will take your input data an sanitize it, so that it cannot be used in a SQL injection attack. An example is as follows:

1
2
3
4
5
$wpdb->query( $wpdb->prepare( "
	INSERT INTO $wpdb->posts
	( post_id, post_content )
	VALUES ( %d, %s)",
        15, "this is un'safe" ) );

As with previous examples, the “%d” and “%s” function as placeholders for the sanitized data.

With those functions and a little bit of work, you should be writing WordPress database queries with the WPDB class in no time!

Categories
Other Programming PHP Programming

Form Ajax : How to Create and Submit a Form Using Ajax

For the longest time, web developers were stuck submitting their forms in the normal way: Click a button, go to a processing page, redirect back.  However, now it is possible to submit a form without ever leaving the page with Ajax.  Ajax stands for Asynchronous JavaScript, which as stated before, basically means you can submit a form without ever leaving the page.

So how do you use form Ajax? First of all, we’re going use a JavaScript library called jQuery.  Don’t be scared of it though, jQuery makes JavaScript easy.  What jQuery allows us to do is use form Ajax without having to muck around with all the tedious JavaScript details (which trust me, is a GOOD thing).   Without further a due, here is how to submit a form with Ajax.

To download the full working code for this, click here.

Form Ajax Step 1:  The HTML Page.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
<html>
<head>
<title>Form Ajax Tutorial</title>
<script type="text/javascript" src="jquery-1.4.2.min.js"></script>
<script type="text/javascript">
//After the document has loaded, it adds the following handlers
//to the web page.
$(document).ready(function() {
//When the form with id="myform" is submitted...
$("#myform").submit(function() {
     //Send the serialized data to formProcessor.php.
     $.post("formProcessor.php", $("#myform").serialize(),
     //Take our repsonse, and replace whatever is in the "formResponse"
     //div with it.
    function(data) {
          $("#formResponse").html(data);
     }
);
return false;
});
});
</script>
<head>
<body>
<h2&gt;Form Ajax Tutorial</h2>
<p&gt; Fill out some information &lt;/p>
<form id="myform">
<input type="text" name="firstName" value="" /><br />
<input type="text" name="lastName" value="" /><br />
<input type="submit" name="submit" value="Submit" />
</form>
<div id="formResponse">
</div>
</body>
</html>

For anyone who is used to HTML programming, this should all look very familiar.  The only confusing part is the JavaScript, but I’ll explain that here.  The first bit just includes the jQuery library.  This is crucial for form ajax to work.  The rest of the function is explained below:

  • $(document).ready() – This will add handlers to your web page only after the entire page has loaded.
  • $().submit() – This will catch the click of the submit button so that it doesn’t submit the form the normal way, but the Ajax way instead.
  • $().post() – This is the part that sends out data to the processing file.  It also has a callback function that will modify our page to contain data that the processing file sent back.
  • $().serialize() – Takes our form data and puts it into an easy to parse format.

Form Ajax Step 2:  The Processing Page

1
2
3
4
5
6
7
8
9
10
11
<?php
//Get the information that was sent from the form.
$firstName = $_POST['firstName'];
$lastName = $_POST['lastName'];
 
//Get the unix time stamp.
$unixTimeStamp = time();
 
//Print output for out web page to catch.
echo "Hello $firstName $lastName.  The local unix time is <b>$unixTimeStamp</b>";
?>

The processing page is very important for making form ajax work correctly.  What this file does is catch the data sent by the form, and then prints out some information.  The form is waiting for this information, and then will add it to your page.  Note:  This file is a .PHP file.  You need to be running a web server (or have access to one) that can process php files.

Form Ajax Step 3:  You’re Done!

Form Ajax used to be pretty difficult, but now that their are JavaScript libraries like jQuery, MooTools, and Scriptaculous, it’s easier than ever.  To download the full working code for this example, click here.

Categories
PHP Programming

Fwds.Me: A Simple URL Shortener

Awhile ago on the internet, there was a big debate going on about the value of URL shortening services like Bit.ly.  I was following the debate with some interested when I decided to create my own URL shortener.  It does only one thing, and that is shorten URLs.  Fwds.Me is still young, so the possibility of getting REALLY short URLs is very real.  If you are looking for a simple url shortener, then look no further than Fwds.Me.

Go To Fwds.Me

Categories
PHP Programming

Validate Email Addresses With PHP

If you’re a web programmer, there will come a time when you need to validate an email address. It’s going to happen, so just accept it. In newer versions of PHP, there is built in functionality for this. However, for those of us not lucky enough to be running the latest and greatest version, we can use regular expressions.

The following PHP function will validate email addresses using regular expressions. True is returned on success, and false is returned otherwise.

1
2
3
function validate_email($email) {
return eregi("^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$", $email);
}
Categories
PHP Programming Wordpress Development

Simple WordPress Plugin Tutorial

Sometimes WordPress just doesn’t do what you want it to do.  When that happens, you turn to plugins for help.  But sometimes, the WordPress plugin repositories don’t have what you need either.  In those cases, it’s time to pull up your sleeves and get to work.  In this tutorial, I’m going to go through the process of creating a simple WordPress plugin from scratch.  I created this plugin as a proof of concept awhile ago, and thought that it would make a great learning tool now.

Step 1:  What are you’re making anyways?

This plugin does one thing only, and it does it well.  It will place a “Submit to Hacker News” button on all of your posts.  While this tutorial is only for a simple plugin, it could easily be extended to include other news aggregation services and social networks.


Step 2:  Creating the plugin.

Now that we know what we’re going to create, we need to create the plugin.  To do that, simply create a file called wp-hacker-news.php and then add the following to it.

1
2
3
4
5
6
7
8
/*
* Plugin Name: WP Hacker News
* Version: 0.1
* Description: Adds a "Submit to Hacker News" button to your posts.
* Author: Jack Slingerland
* Author URI: https://re-cycledair.com/
* Plugin URI: https://re-cycledair.com/wp-hacker-news
*/

The lines above are all required for a WordPress plugin to function correctly.  Below is a quick run-through of what each of these means.

  • Plugin Name – This is the name of your plugin.  It is how it will appear in the WordPress back-end administration panels.
  • Version – The version number.  I always start a 0.1 to start, and then increment like 0.1.1 for small changes.
  • Description – This is the description of your plugin.  Feel free to be verbose here, as this is how people will know what your plugin does.
  • Author – This is you!  Put your name or the name of your team here.
  • Author URI – Your website.  In my case, I link it to https://re-cycledair.com.
  • Plugin URI – The web page for your plugin.  Here, I’m linking it to the original announcement I made for this plugin.

Step 3:  Creating a display function.

So you finally have a WordPress plugin.  That’s great and all, but it doesn’t do anything yet.  What you need now is to create a  function that displays the “Submit To Hacker News” link.  To do that, add this below the comment section:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
//Function to show the HN Link.
function WPHackerNews_link() {
     global $post;
     $link = urlencode(get_permalink($post->ID));
     $title = urlencode($post->post_title);
     $formattedLink = "
     <div style="float: right; margin-left: 10px; margin-bottom: 4px;">
          <a href="http://news.ycombinator.com/submitlink?u=$link&t=$title">
               <img src="https://re-cycledair.com/wp-content/uploads/2010/03/hn.jpg" alt="" />
          </a>
          <span style="font-size: 9px;">
               <a href="http://news.ycombinator.com/submitlink?u=$link&t=$title">Submit to HN</a>
          </span>
          <a href="http://news.ycombinator.com/submitlink?u=$link&t=$title"></a>
     </div>
";
 
     return $formattedLink;
}

The code above explains itself pretty well.  But I’ll break it down a bit anyways.

  1. We set the global post variable.  It holds all of the information about the post we’re currently on.
  2. Store the current post’s permalink and title in variables.
  3. Using some in-line css and good old HTML, we get the Y Combinator logo to float on the right side of the post.
  4. Return the HTML for the button to the caller of the function.


Step 4:  Displaying your plugin.

Everything is going great, but now we need this to actually show up in posts.  To do that, we register a display function with WordPress.  With a bit of logic, we can make it only display on posts.

1
2
3
4
5
6
7
8
9
10
11
//Integrate with Wordpress.
function WPHackerNews_ContentFilter($content) {
     if(is_single()) {
          return WPHackerNews_link().$content;
     } else {
          return $content;
     }
}
 
//Add the filter
add_filter ('the_content', 'WPHackerNews_ContentFilter');

The above code isn’t self-explanatory at all, so here’s how it works.

  1. When we create the function, we make sure to pass it “$content”.  “$content” is a variable that holds the content of the post that the user is on.
  2. We then check to make sure that we are on a single post with the function “is_single()”.
  3. If so, we return our button by calling “WPHackerNews_link()” and appending “$content” to it.
  4. If not, just return the original un-altered content.
  5. The final step is to use the “add_filter” function to add this plugin into WordPress.  The first argument describes where our plugin should be used (“the_content”), and the second argument is what function it should use (“WPHackerNews_contentFilter”).

Step 5:  You’re Done!

That concludes this tutorial on creating a simple WordPress plugin.  All you need to do now is drop this file in your wp-content/plugins directory and then activate it in the admin.  As usual, if you run in to any errors or notice any problems, please let me know and I’ll help the best I can.

Categories
PHP Programming

PHP Validate Email

Every so often (ok, a lot more than that), you need to validate an email address. The obvious solution is to use regular expressions, however PHP provides a better method using the filter_var() function.

To validate an email address using PHP, simply do the following:

1
2
3
4
5
6
$email = "jack@re-cycledair.wploadtest.xyz";
if(filter_var($email, FILTER_VALIDATE_EMAIL) == TRUE) {
     echo "Valid Email.";
} else {
     echo "Email is not valid.";
}

Note: This only works for PHP >= 5.2