Click here to close now.

Welcome!

Apache Authors: Sematext Blog, Pat Romanski, Ruxit Blog, Nikita Ivanov, VictorOps Blog

Related Topics: ColdFusion

ColdFusion: Article

Back to Basics: When sites run into trouble, it's usually the database that's the problem, not ColdFusion

Back to Basics: When sites run into trouble, it's usually the database that's the problem, not ColdFusion

[Ed. Note: We've run articles on this in the past, but - we can never say it enough!]

Cutting ColdFusion code is fun, and therein lies ColdFusion's biggest problem. It's too easy to get carried away writing code to do things that should be happening in the database itself - or often shouldn't be happening at all. It's also just too easy to use. You can get up and running far too quickly, and get into trouble even quicker. Sometimes experienced developers get carried away trying to do too much; beginners often start to run before they can walk.

When I came across a site where a simple query of news stories was taking a quarter of a second, I wasn't sure that writing their own object-caching mechanism in ColdFusion was entirely the right approach. I called a timeout - a morning to go over the basics. One simple run of the Query Analyzer diagnosed the problem: a date field was being used in a sort clause without an index.

The query time was reduced to around 15ms - around 6ms database time plus the ODBC overhead. If you use SQL Server, the Query Analyzer is a tool you should learn to love. As a basic rule, any query that does more than select a single record using a primary as the criterion will probably benefit from some sort of index.

Using it couldn't be simpler - simply open a window, enter your SQL query text and then press Ctrl + I to run an index analysis (see Figure 1). In SQL Server versions 7 and up it's become an invaluable tool. You can also take a look at the path analysis and turn on the "Show stats time" option in the Query->Current Connection Options dialog.

It's useful, but in a publishing application there's only so much you can do to tune database performance. You can get too worried about optimization - you're pushing about large text strings and that takes time. Hard-core optimization is more relevant in financial and engineering applications where the database is summarizing and calculating vast numbers of small records.

You're more likely to achieve better results if you concentrate your efforts on your ColdFusion code. ColdFusion offers two powerful ways of caching queries: the "cached within" feature and caching in a shared scope. Many developers aren't aware that you can cache any ColdFusion object in the application scope; structs and queries can just as easily be cached as scalar variables. You can cache queries that are unlikely to change in application.cfm:

<cfif NOT IsDefined("application.courses")>
<cfquery name="courses" datasource="#application.dsn#">
SELECT * FROM COURSES
ORDER BY COURSE
</cfquery>

<cflock scope="application" type="exclusive" timeout="10">
<cfset application.courses = courses>
</cflock>
</cfif>

(The locking after the query is vitally important; see article 20370 in the ColdFusion knowledgebase.)

By putting the actual query and assignment above into a separate include file and calling it whenever the table is affected, you can create your own full-fledged cache server. Alternatively, you can convert the query to a struct keyed by the primary key of the database table and put the struct into the application scope. There are custom tags in the developer's exchange to do this, but it's almost quicker to do it yourself.

Don't be scared of using the application scope. Memory's cheap and downtime isn't. If your entire database is only half a gigabyte and you have 2GB in your server, why not load it into the application scope?

Usually it's best to mix this approach with the cached within approach. Say you have a database of two thousand suppliers, listed by region, together with the services they offer. You want to use the "startrow, endrow" parameters of <CFOUTPUT> to show fifty at a time, but this means fetching the entire listing in the query, which takes nearly 150ms.

In this case you could put the fields you need for the listing in an application-level query and then, on a separate "view" page, select the main details in a separate query that you apply the cached within parameter to. This parameter of <CFQUERY> is one of ColdFusion's best features. It can send your sysadmin into a blind panic when he or she sees CF taking up 800MB ("It's got a leak!"), but it's so simple to use compared to rival offerings.

One technique I use all the time, caching formatted output, is a little more contentious. Say, for instance, you have fields in your supplier database for TEL, FAX, EMAIL, WEB - any of which may be blank or need formatting. In a listing page of fifty records this can take time. On one of my pages it was taking too much time just to do a simple list. My solution? Rather than copy the whole query into the application scope, I created a new query (see the Query-New() function) with fields for the primary key and formatted display.

Even more contentious: sometimes I put the formatted output back into the database. I do this all the time where the database is storing XML and I need to convert to HTML. Converting a page on the fly takes around 400ms (assuming XSL doesn't get stuck in an endless loop) and displaying a cached query takes less than 1ms.

This may sound like a lash-up (I less than impressed an old college friend who followed a slightly more formal career), but this is how the really expensive application servers work. Sure, they won't store large chunks of HTML in the database; they'll store serialized Java objects, but at least the HTML will still work when somebody changes the code. (Application servers store objects in a format that depends on the object definition [your code]. When you change that definition, the entire object store can be rendered unusable. It's very problematic, and I've seen plenty of implementations of expensive object servers where almost all the data is stored in plain SQL tables.)

Note: With reference to Ben Forta's February CFDJ article ("Faster and Safer Database Queries," Vol. 4, issue 2), you can't cache a query that uses a QUERYPARAM tag. You can, though, achieve the same security against malicious URLs by ensuring that all your tables have a numeric primary key and then validating any variables used in SQL queries with an "IsNumeric()" test. It's one more reason why all database tables should have a numeric primary key.

No matter how cheap memory gets, you do still need to apply a certain amount of discipline to your coding. One of the most common causes of database problems is fetching too much data from the database. Using derived tables and other advanced SQL constructs, you should be able to avoid fetching vast amounts of query data and looping over it.

A typical derived table query looks something like this:

SELECT TOP 1 A.NEWS_ID, A.HEADLINE, A.STORY FROM NEWS AS A,
(SELECT NEWS_ID FROM NEWS_CATEGORIES_JOIN
WHERE CATEGORY_ID IN (#cat_list#)) AS B
WHERE A.NEWS_ID = B.NEWS_ID
ORDER BY A.PUBDATE DESC

This will select a record from a correctly built many-to-many join (of which much more later) when a GROUP BY can't be used (i.e., with long text fields). This is just a simple example (a subselect could be used equally well here and run as quickly); derived tables are most effective when some sort of calculation needs to be performed, perhaps a summation or a count. If you find the idea of working with tables that don't exist a little frightening, you can often achieve the same affect by using a view, a stored query that can be queried and joined just like a standard table (see Figure 2).

Often underused, views are a great tool for making both your SQL and your CF code as simple as possible. They're second nature to anyone who's come to SQL Server from Access (where they're known as queries), but experienced programmers who have been using old-style databases often prefer to do everything in one query. Using views is usually quicker than performing joins or summaries at runtime. I saw a job advertisement recently that said "must be able to write join queries without using visual tools" and I thought: "Why?" There's no benefit, save a few techy kudos.

If you do need to loop over entire tables, try to avoid fetching the data into ColdFusion. Database cursors (described by Ian Rutherford in "Using MS-SQL Stored Procedures with ColdFusion," CFDJ Vol. 4, issue 2) will run much faster than dragging data into ColdFusion, processing it, and then running an update. It also won't take up a ColdFusion thread or valuable network capacity.

Speed Isn't Everything
As important as optimization is, sometimes there's too much focus on saving a millisecond here or cutting down white space there at the expense of properly structured code and databases. The key to software engineering, as in all engineering, is to keep it simple, and the main aim in a ColdFusion development should be no different from any other project: to keep the code to a minimum.

It's this discipline of keeping things simple that's the most important part of building reliable and scalable sites. It may seem strange, but sometimes code that runs slower can make for better Web sites. Take, for example, what for me is the litmus test of good database design: the use of many-to-many joins. A typical database application might be a listing of schools and the courses they offer. The original data, as always, is in Excel, looking something like the example in Figure 3. The experienced developer will recognize the need for a separate table of courses and a third "join" table indicating which schools have which courses.

The temptation, though, is simply to import it into the database and use it as is, relying on ColdFusion's list handling to determine which courses are offered by which schools. That would be bad, but what would be even worse would be some attempt to code the courses offered as separate database fields (see Figure 4).

Why you shouldn't do this is hard to explain, although it's simple: in a SELECT query it'll run two to three times as fast as using a many-to-many join, and it's easy to program using ColdFusion Studio. You just point the wizards at the right table and five minutes later you're done. But by doing something like this, you have hard-coded "data" into the "structure" of your system. To build truly scalable systems, you need to keep a very definite division between the structure and the data. The Golden Rule is this: if your ColdFusion code is completely separate from your database, then your system will scale. If you've hard-coded data as fieldnames - you just can't do this - your data is inextricably tied up with your application.

Using ColdFusion, it isn't always possible to adhere to the strict three-tier model (database, application, presentation), but creating a division between the structure and the data is still a must. You should be able to edit the data directly without using your ColdFusion application and still see the changes reflected on your site.

So, for instance, were you to add a new course to the table of courses, your site should display that course wherever relevant. Often this doesn't happen; data fields are hard-coded and the layout of the pages is dependent on the data structure.

Sometimes you might use the ColdFusion application to enforce certain database rules - for instance, that a salesperson can have a maximum of only 10 customers. This is fine. In fact, database theorists call such a rule an application rule for the very reason that it doesn't affect the underlying data. If a salesperson had 11 clients it wouldn't affect the data in any way. But if the only way you could add or remove clients was through the ColdFusion-based Web interface, then something would be amiss.

A simple example like the one shown in Figure 4 doesn't really illustrate the necessity of relational design. Once you're bug-free, the example will run faster than a relational design, and it will scale. It's simple and, above all, that's what matters.

You wouldn't have to get much more complicated before a flat file system like this would start to break down, though. It's not just raw performance that's the issue - the extra complexity of the code and the difficulties in making changes will all contribute in different ways to a lack of scalability. If a system relies on a single overworked developer to make simple changes, then it won't be scalable no matter how optimized the code.

As with any rule, the one about keeping the data separate from the code is easily broken. It's quite possible to make such a convoluted mess of triggers, derived tables, cursor loops, and other database constructs that no amount of processing power will save you. Keep it simple, though, and it will scale.

Easy in Theory
Although it makes things simpler in the database system, relational design can, if you're not careful, make things more complicated in the Web application. Using separate tables for many-to-many joins requires extra INSERT queries, and SELECT functions often have to employ grouping, subselects, or derived tables.

Perhaps the principal reason many-to-many joins don't get used is that most database editing systems don't have a standard component for implementing them. Access doesn't (you have to create a subform or use some VB), and if you're using HTML forms, you unfortunately can't use the "query" attribute of <CFSELECT> as it doesn't support multiple values for "selected". You can, however, use a simple loop:

<cfselect name="course_id" multiple="Yes">
<cfoutput query="application.courses">
<option #iif(ListFind("#current_vals#",courses_id),
DE("selected"), DE(""))# value="#courses_id#">#course#
</cfoutput>
</cfselect>

Alternatively, you can use checkboxes; another effective device for editing many-to-many joins is a swap box. It's especially useful when you have many possible options to choose from, and it's more intuitive than multiple select boxes where the user has to control-click (see Figure 5).

There are several custom tags to do this in the developer's exchange, and my own <CF_SWAPBOX> is available from my site (see "Resources" section). It has almost identical syntax to CFSELECT, and also supports multiple values for "selected".

One thing you can't do much about is the complexity of inserting and updating the "join" tables (the third table used for many-to-many joins). You can cut down the number of queries needed by using an "insert into" with a subselect, for example:

INSERT INTO COURSES_SCHOOLS_JOIN
(COURSES_ID, SCHOOLS_ID)
SELECT COURSES_ID, #form.SCHOOLS_ID#
FROM COURSES
WHERE COURSES_ID IN (#form.courses_id#)

but this won't run any faster than using multiple inserts in the same query. Nor can you run this as a stored procedure (you can't pass a list of integer primary keys as a VARCHAR and then use it in a subselect), nor can you use the <CFQUERYPARAM> tag. It's best just to run a separate insert for each row of the join table.

Perhaps, though, the biggest obstacle to correct database design isn't a technical one, but the very familiar problem of client demands. Before you can build a sound database, you need to convince the client that that's what's needed. The temptation for clients is to think of their Web site like a magazine: "I want to put that over there, that over there." Convincing clients they need to "think data" is the hardest part of any project.

In the most difficult project I ever worked on, the editor of the site was so incensed at no longer being able to work on the HTML pages himself that he got hold of the administrator password and replaced all the ColdFusion code with HTML pages he'd "saved as" from his browser. All the dynamic functions duly broke, but because the site still looked the same as it did before, the client never really understood what had happened. As far as he was concerned, my code didn't work. The editor was there to edit pages and that was what he was doing.

The concept of changing a page not by editing the page itself, but by editing a separate data system, is a surprisingly difficult conceptual hurdle for many people. It's such a simple concept that it's easy to forget how revolutionary it is. Those who have come from a print media background aren't going to buy into the database idea until they've had that hallelujah moment and seen just what an impact information systems can have. Until that time they're going to find using a database frustrating and too inflexible.

If your client isn't of the database mind-set when you start a project, it's vital that you help him or her see the light before you start programming. Otherwise you'll end up putting in hooks and hacks to try and satisfy client demands ("sometimes we might want to do something different"). They'll come back to haunt you.

I've found that the most important ingredient of a successful site isn't the design or my code, but getting the client used to the idea of editing a database. A client who comes to you not with ideas for "new pages" but a "new section" and a sketch of the underlying data is worth more than any amount of milliseconds saved from a database query.

Resources

  • To download <CF_SWAPBOX>: www.articlemanager.com/download.cfm
  • A good guide to relational database design: Hernandez, M.J. (1997). Database Design for Mere Mortals. Addison-Wesley.
  • More Stories By Tom Peer

    Tom Peer has been in electronic publishing of one sort or another for ten years, including a stint as manager of New Scientist Online (www.newscientist.com). He specializes in taking printed publications online and has recently completed the online edition of The World Handbook of Stock Exchanges (www.exchange-handbook.com).

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


    @ThingsExpo Stories
    When it comes to the Internet of Things, hooking up will get you only so far. If you want customers to commit, you need to go beyond simply connecting products. You need to use the devices themselves to transform how you engage with every customer and how you manage the entire product lifecycle. In his session at @ThingsExpo, Sean Lorenz, Technical Product Manager for Xively at LogMeIn, will show how “product relationship management” can help you leverage your connected devices and the data they generate about customer usage and product performance to deliver extremely compelling and reliabl...
    The IoT market is projected to be $1.9 trillion tidal wave that’s bigger than the combined market for smartphones, tablets and PCs. While IoT is widely discussed, what not being talked about are the monetization opportunities that are created from ubiquitous connectivity and the ensuing avalanche of data. While we cannot foresee every service that the IoT will enable, we should future-proof operations by preparing to monetize them with extremely agile systems.
    There’s Big Data, then there’s really Big Data from the Internet of Things. IoT is evolving to include many data possibilities like new types of event, log and network data. The volumes are enormous, generating tens of billions of logs per day, which raise data challenges. Early IoT deployments are relying heavily on both the cloud and managed service providers to navigate these challenges. Learn about IoT, Big Data and deployments processing massive data volumes from wearables, utilities and other machines.
    SYS-CON Events announced today that CodeFutures, a leading supplier of database performance tools, has been named a “Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. CodeFutures is an independent software vendor focused on providing tools that deliver database performance tools that increase productivity during database development and increase database performance and scalability during production.
    The explosion of connected devices / sensors is creating an ever-expanding set of new and valuable data. In parallel the emerging capability of Big Data technologies to store, access, analyze, and react to this data is producing changes in business models under the umbrella of the Internet of Things (IoT). In particular within the Insurance industry, IoT appears positioned to enable deep changes by altering relationships between insurers, distributors, and the insured. In his session at @ThingsExpo, Michael Sick, a Senior Manager and Big Data Architect within Ernst and Young's Financial Servi...
    The major cloud platforms defy a simple, side-by-side analysis. Each of the major IaaS public-cloud platforms offers their own unique strengths and functionality. Options for on-site private cloud are diverse as well, and must be designed and deployed while taking existing legacy architecture and infrastructure into account. Then the reality is that most enterprises are embarking on a hybrid cloud strategy and programs. In this Power Panel at 15th Cloud Expo (http://www.CloudComputingExpo.com), moderated by Ashar Baig, Research Director, Cloud, at Gigaom Research, Nate Gordon, Director of T...
    “In the past year we've seen a lot of stabilization of WebRTC. You can now use it in production with a far greater degree of certainty. A lot of the real developments in the past year have been in things like the data channel, which will enable a whole new type of application," explained Peter Dunkley, Technical Director at Acision, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
    SYS-CON Events announced today that Intelligent Systems Services will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Established in 1994, Intelligent Systems Services Inc. is located near Washington, DC, with representatives and partners nationwide. ISS’s well-established track record is based on the continuous pursuit of excellence in designing, implementing and supporting nationwide clients’ mission-critical systems. ISS has completed many successful projects in Healthcare, Commercial, Manufacturing, ...
    PubNub on Monday has announced that it is partnering with IBM to bring its sophisticated real-time data streaming and messaging capabilities to Bluemix, IBM’s cloud development platform. “Today’s app and connected devices require an always-on connection, but building a secure, scalable solution from the ground up is time consuming, resource intensive, and error-prone,” said Todd Greene, CEO of PubNub. “PubNub enables web, mobile and IoT developers building apps on IBM Bluemix to quickly add scalable realtime functionality with minimal effort and cost.”
    DevOps tends to focus on the relationship between Dev and Ops, putting an emphasis on the ops and application infrastructure. But that’s changing with microservices architectures. In her session at DevOps Summit, Lori MacVittie, Evangelist for F5 Networks, will focus on how microservices are changing the underlying architectures needed to scale, secure and deliver applications based on highly distributed (micro) services and why that means an expansion into “the network” for DevOps.
    For years, we’ve relied too heavily on individual network functions or simplistic cloud controllers. However, they are no longer enough for today’s modern cloud data center. Businesses need a comprehensive platform architecture in order to deliver a complete networking suite for IoT environment based on OpenStack. In his session at @ThingsExpo, Dhiraj Sehgal from PLUMgrid will discuss what a holistic networking solution should really entail, and how to build a complete platform that is scalable, secure, agile and automated.
    We’re no longer looking to the future for the IoT wave. It’s no longer a distant dream but a reality that has arrived. It’s now time to make sure the industry is in alignment to meet the IoT growing pains – cooperate and collaborate as well as innovate. In his session at @ThingsExpo, Jim Hunter, Chief Scientist & Technology Evangelist at Greenwave Systems, will examine the key ingredients to IoT success and identify solutions to challenges the industry is facing. The deep industry expertise behind this presentation will provide attendees with a leading edge view of rapidly emerging IoT oppor...
    In the consumer IoT, everything is new, and the IT world of bits and bytes holds sway. But industrial and commercial realms encompass operational technology (OT) that has been around for 25 or 50 years. This grittier, pre-IP, more hands-on world has much to gain from Industrial IoT (IIoT) applications and principles. But adding sensors and wireless connectivity won’t work in environments that demand unwavering reliability and performance. In his session at @ThingsExpo, Ron Sege, CEO of Echelon, will discuss how as enterprise IT embraces other IoT-related technology trends, enterprises with i...
    The Internet of Things (IoT) is causing data centers to become radically decentralized and atomized within a new paradigm known as “fog computing.” To support IoT applications, such as connected cars and smart grids, data centers' core functions will be decentralized out to the network's edges and endpoints (aka “fogs”). As this trend takes hold, Big Data analytics platforms will focus on high-volume log analysis (aka “logs”) and rely heavily on cognitive-computing algorithms (aka “cogs”) to make sense of it all.
    The Internet of Everything (IoE) brings together people, process, data and things to make networked connections more relevant and valuable than ever before – transforming information into knowledge and knowledge into wisdom. IoE creates new capabilities, richer experiences, and unprecedented opportunities to improve business and government operations, decision making and mission support capabilities. In his session at @ThingsExpo, Gary Hall, Chief Technology Officer, Federal Defense at Cisco Systems, will break down the core capabilities of IoT in multiple settings and expand upon IoE for bo...
    Sensor-enabled things are becoming more commonplace, precursors to a larger and more complex framework that most consider the ultimate promise of the IoT: things connecting, interacting, sharing, storing, and over time perhaps learning and predicting based on habits, behaviors, location, preferences, purchases and more. In his session at @ThingsExpo, Tom Wesselman, Director of Communications Ecosystem Architecture at Plantronics, will examine the still nascent IoT as it is coalescing, including what it is today, what it might ultimately be, the role of wearable tech, and technology gaps stil...
    With several hundred implementations of IoT-enabled solutions in the past 12 months alone, this session will focus on experience over the art of the possible. Many can only imagine the most advanced telematics platform ever deployed, supporting millions of customers, producing tens of thousands events or GBs per trip, and hundreds of TBs per month. With the ability to support a billion sensor events per second, over 30PB of warm data for analytics, and hundreds of PBs for an data analytics archive, in his session at @ThingsExpo, Jim Kaskade, Vice President and General Manager, Big Data & Ana...
    One of the biggest impacts of the Internet of Things is and will continue to be on data; specifically data volume, management and usage. Companies are scrambling to adapt to this new and unpredictable data reality with legacy infrastructure that cannot handle the speed and volume of data. In his session at @ThingsExpo, Don DeLoach, CEO and president of Infobright, will discuss how companies need to rethink their data infrastructure to participate in the IoT, including: Data storage: Understanding the kinds of data: structured, unstructured, big/small? Analytics: What kinds and how responsiv...
    Cloudian, Inc., the leading provider of hybrid cloud storage solutions, today announced availability of Cloudian HyperStore 5.1 software. HyperStore 5.1 is an enhanced Amazon S3-compliant, plug-and-play hybrid cloud software solution that now features full Apache Hadoop integration. Enterprises can now transform big data into smart data by running Hadoop analytics on HyperStore software and appliances. This in-place analytics, with no need to offload data to other systems for Hadoop analyses, enables customers to derive meaningful business intelligence from their data quickly, efficiently and ...
    Since 2008 and for the first time in history, more than half of humans live in urban areas, urging cities to become “smart.” Today, cities can leverage the wide availability of smartphones combined with new technologies such as Beacons or NFC to connect their urban furniture and environment to create citizen-first services that improve transportation, way-finding and information delivery. In her session at @ThingsExpo, Laetitia Gazel-Anthoine, CEO of Connecthings, will focus on successful use cases.