Apache proxy - HTTP to HTTPS. Ask Question Asked 5 years, 10 months ago. Active 5 years, 10 months ago. Viewed 2k times 2. I work with application which comunicate with server over HTTP, but this server runs on HTTPS in my case and there is also basic authentication. There is no way to change this settings. The Apache HTTP Proxy installer provided by ESET is pre-configured. However, additional custom configuration is needed for the service to work correctly. Configuration of Apache HTTP Proxy for replication (Agent - Server) 1. Modify the Apache HTTP Proxy configuration file httpd.conf located in C: Program Files Apache HTTP Proxy conf.
In 2003, Nick Kew released a new module that complements Apache'smod_proxy and is essential for reverse-proxying. Since then he getsregular questions and requests for help on proxying with Apache. Inthis article he attempts to give a comprehensive overview of theproxying and mod_proxy_html
This article was originally published at ApacheWeek in January 2004,and moved to ApacheTutor with minor updates in October 2006.
A proxy server is a gateway for users to the Web at large. Usersconfigure the proxy in their browser settings, and all HTTP requestsare routed via the proxy. Proxies are typically operated by ISPs andnetwork administrators, and serve several purposes: for example,
- to speed access to the Web by caching pages fetched, so that popular pages don't have to be re-fetched for every user who views them.
- to enable controlled access to the web for users behind a firewall.
- to filter or transform web content.
A reverse proxy is a gateway for servers, and enables one web serverto provide content from another transparently. As with a standardproxy, a reverse proxy may serve to improve performance of the web bycaching; this is a simple way to mirror a website. But the most commonreason to run a reverse proxy is to enable controlled access from theWeb at large to servers behind a firewall.
The proxied server may be a webserver itself, or it may be anapplication server using a different protocol, or an applicationserver with just rudimentary HTTP that needs to be shielded fromthe web at large. Since 2004, reverse proxying has been the preferredmethod of deploying JAVA/Tomcat applications on the Web, replacingthe old mod_jk (itself a special-purpose reverse proxy module).
Proxying with Apache
The standard Apache module mod_proxy supports both types of proxyoperation. Under Apache 1.x, mod_proxy only supported HTTP/1.0, butfrom Apache 2.0, it supports HTTP/1.1. This distinction isparticularly important in a proxy, because one of the most significantchanges between the two protocol versions is that HTTP/1.1 introducesrich new cache control mechanisms.
This article deals with running a reverse proxy with Apache 2. Usersof earlier versions of Apache are encouraged to upgrade and takeadvantage of the altogether richer architecture and improvedapplication support. At the time of writing, the reason most commonlycited for not upgrading is difficulties running PHP on Apache 2. Icannot speak from personal experience, but several well-informedsources tell me the difficulty lies with non-thread-safe code in PHP,and that it works well with Apache 2 if it is built with thenon-threaded Prefork MPM.
The Apache Proxy Modules
So far, we have spoken loosely of mod_proxy. However, it's a littlemore complicated than that. In keeping with Apache's modulararchitecture, mod_proxy is itself modular, and a typical proxy serverwill need to enable several modules. Those relevant to proxying andthis article include:
- mod_proxy: The core module deals with proxy infrastructure and configuration and managing a proxy request.
- mod_proxy_http: This handles fetching documents with HTTP and HTTPS.
- mod_proxy_ftp: This handles fetching documents with FTP.
- mod_proxy_connect: This handles the CONNECT method for secure (SSL) tunneling.
- mod_proxy_ajp: This handles the AJP protocol for Tomcat and similar backend servers.
- mod_proxy_balancer implements clustering and load-balancing over multiple backends.
- mod_cache, mod_disk_cache, mod_mem_cache: These deal with managing a document cache. To enable caching requires mod_cache and one or both of disk_cache and mem_cache.
- mod_proxy_html: This rewrites HTML links into a proxy's address space.
- mod_headers: This modifies HTTP request and response headers.
- mod_deflate: Negotiates compression with clients and backends.
Having mentioned the modules, I'm going to ignore caching for theremainder of this article. You may want to add it if you are concernedabout the load on your network or origin servers, but the details areoutside the scope of this article. I'm also going to ignore allnon-HTTP protocols, and load balancing.
Building Apache for Proxying
With the exception of mod_proxy_html, the above are all included inthe core Apache distribution. They can easily be enabled in the Apachebuild process. For example:
Of course, you may want other build options too, and you could just aswell build the modules as static.
If you are adding proxying to an existing installation, you should useapxs instead:
This leaves mod_proxy_html, which is not included in the coredistribution. mod_proxy_html is a third-party module, and requires athird-party library libxml2. At the time of writing, libxml2 isinstalled as standard or packaged for most operating systems. If youdon't have it, you can download it from xmlsoft.org and install ityourself. For the purposes of this article, we'll assume libxml2 isinstalled as /usr/lib/libxml2.so, with headers in/usr/include/libxml2/libxml/.
- Check libxml2 is installed. If you have a version older than 2.5.10, then upgrade - there's a bug in earlier versions that can, in some particular cases, severely affect performance.
- Download mod_proxy_html.c from http://apache.webthing.com/
- Build mod_proxy_html with apxs:
A Reverse Proxy Scenario
Company example.com has a website at www.example.com, which has apublic IP address and DNS entry, and can be accessed from anywhereon the Internet.
The company also has a couple of application servers which haveprivate IP addresses and unregistered DNS entries, and are inside thefirewall. The application servers are visible within the network -including the webserver, as 'internal1.example.com' and'internal2.example.com', But because they have no public DNS entries,anyone looking at internal1.example.com from outside the companynetwork will get a 'no such host' error.
A decision is taken to enable Web access to the application servers.But they should not be exposed to the Internet directly, instead theyshould be integrated with the webserver, so thathttp://www.example.com/app1/any-path-here is mapped internally tohttp://internal1.example.com/any-path-here andhttp://www.example.com/app2/other-path-here is mapped internally tohttp://internal2.example.com/other-path-here. This is a typicalreverse-proxy situation.
Configuring the Proxy
As with any modules, the first thing to do is to load them inhttpd.conf (this is not necessary if we build them statically intoApache).
For windows users this is slightly different: you'll need to loadlibxml2.dll rather than libxml2.so, and you'll probably need toload iconv.dll and xlib.dll as prerequisites to libxml2 (youcan download them from zlatkovic.com, the same site thatmaintains windows binaries of libxml2). The LoadFile directive is the same.
Of course, you may not need all the modules. Two that are not requiredin our typical scenario are shown commented out above.
Having loaded the modules, we can now configure the Proxy. But beforedoing so, we have an important security warning:
Do Not set 'ProxyRequests On'. Setting ProxyRequests On turns yourserver into an Open Proxy. There are 'bots scanning the Web for openproxies. When they find you, they'll start using you to route aroundblocks and filters to access questionable or illegal material. Atworst, they might be able to route email spam through your proxy. Yourlegitimate traffic will be swamped, and you'll find your servergetting blocked by things like family filters.
Of course, you may also want to run a forward proxy withappropriate security measures, but that lies outside the scope of thisarticle. The author runs both forward and reverse proxies on the sameserver (but under different Virtual Hosts).
The fundamental configuration directive to set up a reverse proxy isProxyPass. We use it to set up proxy rules for each of the applicationservers:
Now as soon as Apache re-reads the configuration (the recommended wayto do this is with 'apachectl graceful'), proxy requests will work, sohttp://www.example.com/app1/some-path maps tohttp://internal1.example.com/some-path as required.
However, this is not the whole story. ProxyPass just sends trafficstraight through. So when the application servers generate referencesto themselves (or to other internal addresses), they will be passedstraight through to the outside world, where they won't work.
For example, an HTTP redirection often takes place when a user (orauthor) forgets a trailing slash in a URL. So the response to arequest for http://www.example.com/app1/foo proxies tohttp://internal.example.com/foo which generates a response:
But from the outside world, the net effect of this is a 'No such host'error. The proxy needs to re-map the Location header to its ownaddress space and return a valid URL
The command to enable such rewrites in the HTTP Headers isProxyPassReverse. The Apache documentation suggests the form:
However, there is a slightly more complex alternative form that Irecommend as more robust:
The reason for recommending this is that a problem arises with someapplication servers. Suppose for example we have a redirect:
This is a violation of the HTTP protocol and so should never happen:HTTP only permits full URLs in Location headers. However, it is also asource of much confusion, not least because the CGI spec has a similarLocation header with different semantics where relative paths areallowed. There are a lot of broken servers out there! In thisinstance, the first form of ProxyPassReverse will return the incorrectresponse
which, even allowing for error-correcting browsers, is outside theProxy's address space and won't work. The second form fixes this to
which is still broken, but will at least work in error-correctingbrowsers. Most browsers will deal with this.
Fixing HTML Links
As we have seen, ProxyPassReverse remaps URLs in the HTTP headers toensure they work from outside the company network. There is, however,a separate problem when links appear in HTML pages served. Considerthe following cases:
- This link will be resolved by the browser and will work correctly.
- This link will be resolved by the browser to http://www.example.com/otherfile.html, which is incorrect.
- This link will resolve to 'no such host' for the browser.
The same problem of course applies to included content such as images,stylesheets, scripts or applets, and other contexts where URLs occurin HTML.
To fix this requires us to parse the HTML and rewrite the links. Thisis the purpose of mod_proxy_html. It works as an output filter,parsing the HTML and rewriting links as it is served. Twoconfiguration directives are required to set it up:
- SetOutputFilter proxy-html This simply inserts the filter, to enable ProxyHTMLURLMap
- ProxyHTMLURLMap from-pattern to-pattern [flags] In its basic form, this has a similar purpose and semantics to ProxyPassReverse. Additionally, an extended form is available to enable search-and-replace rewriting of URLs within Scripts and Stylesheets.
How it works
mod_proxy_html is based on a SAX parser: specifically the HTMLparsermodule from libxml2 running in SAX mode (any other parse mode would ofcourse be very much slower, especially for larger documents). It hasfull knowledge of all URI attributes that can occur in HTML 4 andXHTML 1. Whenever a URL is encountered, it is matched againstapplicable ProxyHTMLURLMap directives. If it starts with anyfrom-pattern, that will be rewritten to the to-pattern. Rules areapplied in the reverse order to their appearance in httpd.conf, andmatching stops as soon as a match is found.
Here's how we set up a reverse proxy for HTML. Firstly, full links tothe internal servers should be rewritten regardless of where theyarise, so we have:
Note that in this instance we omitted the 'trailing' slash. Since thematching logic is starts-with, we use the minimal matching pattern. Wehave now globally fixed case 3 above.
Case 2 above requires a little more care. Because the link doesn'tinclude the hostname, the rewrite rule must be context-sensitive. Aswith ProxyPassReverse above, we deal with that using
Debugging your Proxy Configuration
The above is a simple case taken from mod_proxy_html version 1. Withthe more complex URLmapping and rewriting enabled by Version 2, youmay need a bit of help setting up a complex ruleset, perhaps involvinga series of complex regexps, chained anc blocking rules, etc. To helpwith setting up and troubleshooting your rulesets, mod_proxy_html 2provides a 'debug' mode, in which all the 'interesting' things it doesare written to the Apache error log. To analyse and fix your rulesets,set
Now run your testcases through your rulesets, and examine the apacheerror log for details of exactly how it was processed.
Do not leave ProxyHTMLLogVerbose On for normal use. Although theeffect is marginal, it is an overhead.
Extended URL Mapping
Because the extended mode is text-based, it can no longer guarantee tomatch exact URLs. It's up to you to devise matching rules that canpick out URLs, just as if you were writing an old-fashioned Perl orPHP regexp-based filter (though of course it's still massively moreefficient than performing search-and-replace on an entire documentin-memory). To help with this, ProxyHTMLExtended supports both simpletext-based and regular expression search-and-replace, according to theflags. You can also use the flags to specify rules separately for HTMLlinks, scripting events, and embedded scripts and stylesheets.
A second key consideration with extended URL mapping is that whereasan HTML link contains exactly one URL, a script or stylesheet maycontain many. So instead of stopping after a successful match, theprocessor will apply all applicable mapping rules. This can be stoppedwith the L (last) flag.
Dealing with multimedia content
We just set up a proxy to parse and where necessary correct HTML. Butof course, the web isn't just HTML. Surely feeding non-HTML contentthrough an HTML parser is at best inefficient, if not totally broken?
Yes indeed. mod_proxy_html deals with that by checking theContent-Type header, and removing itself from the processing chainwhen a document is not HTML (text/html) or XHTML(application/xhtml+xml). This happens in the filter initialisationphase, before any data are processed by the filter.
But that still leaves a problem. Consider compressed HTML:
Feeding that into an HTML parser is clearly broken!
There are two solutions to this. One is to uncompress the incomingdata with mod_deflate.Uncompressing and compressing content radically reduces networktraffic, but increases the processor load on the proxy. It isworthwhile if and only if bandwidth between the proxy and thebackend is at a premium: this is common on the 'net at large,but unlikely to be the case on a company internal network.
The alternative solution is to refuse to supportcompression. Stripping any Accept-Encoding request header does thejob. So invoking mod_headers, we add a directive
This should only apply to the Proxy, so we put it inside our
A similar situation arises in the case of encrypted (https) content.But in this case, there is no such workaround: if we could decrypt thedata to process it then so could any other man-in-the-middle, and thesecurity would be worthless. This can only be circumvented byinstalling mod_ssl and a certificate on the proxy, so that the actualsecure session is between the browser and the proxy, not the originserver.
The Complete Configuration
We are now in a position to write a complete configuration for ourreverse proxy. Here is a bare minimum, that ignores extendedurlmapping:
Of course, there's more than one way to do it. Our configuration wouldactually have been simpler if we'd used Virtual Hosts for eachapplication server. But that takes you beyond the realm of Apacheconfiguration and into DNS. If you don't fully understand that (or ifyou think 'why can't I see my domain' is a webserver question), thenplease don't try using virtual hosts for this.
We haven't dealt with caching in this article. In a company-intranetsituation, the connection from the proxy to the application servers isthe local LAN, which is probably fast and has ample capacity. In suchcases, caching at the proxy will have little effect, and can probablybe omitted.
If we want to cache pages, we can of course do so with mod_cache Butthat is beyond the scope of this article.
Another powerful use for a proxy is to transform the contenton-the-fly according to the user's preferences. This author's flagshipmod_accessibility product (from which mod_proxy_html is a spinoff)serves to transform HTML and XHTML on-demand to enhance usability andaccessibility.
Filtering and Security
A reverse proxy is not the natural place for a 'family filter', but isideal for defining access controls and imposing security restrictions.We could, for example, configure the proxy to recognise a customheader from an origin server and block content based on it. Thisdelegates control to the application servers.
Questions and Answers
(A) It doesn't really, but it may appear to. Here are the possible causes:
Changing the FPI (the
<!DOCTYPE ...>line) may affect some browsers. FIX: set the doctype explicitly if this bothers you.
mod_proxy_html has the side-effect of transforming content to utf-8 (Unicode) encoding. This should not be a problem: utf-8 is well-supported by browsers, and offers comprehensive support for internationalisation. If it appears to cause a problem, that's almost certainly a bug in the application server, or possibly a misconfigured browser. FIX: filter through mod_charset_lite to your chosen charset.
mod_proxy_html will perform some minor normalisations. If your HTML includes elements that are closed implicitly, it will explicitly close them. In other words:
will be transformed to
If this affects the rendition in your browser, it almostcertainly means you are using malformed HTML and relying onerror-correction in a browser. FIX: validate your HTML! Theonline Page Valet service will both validate and show yourmarkup normalised by the DTD, while a companion toolAccessValet will show markup normalised by the same parserused in the proxy, and highlight other problems. Both areavailable at http://valet.webthing.com/
Proxy Https To Http Apache
Details on load balancer stickyness
Apache Proxy Example
When using cookie based stickyness, you need to configure the name of the cookie that contains the information about which back-end to use. This is done via the stickysession attribute added to either
ProxySet. The name of the cookie is case-sensitive. The balancer extracts the value of the cookie and looks for a member worker with route equal to that value. The route must also be set in either
ProxySet. The cookie can either be set by the back-end, or as shown in the above example by the Apache web server itself.
Some back-ends use a slightly different form of stickyness cookie, for instance Apache Tomcat. Tomcat adds the name of the Tomcat instance to the end of its session id cookie, separated with a dot (
.) from the session id. Thus if the Apache web server finds a dot in the value of the stickyness cookie, it only uses the part behind the dot to search for the route. In order to let Tomcat know about its instance name, you need to set the attribute
jvmRoute inside the Tomcat configuration file
conf/server.xml to the value of the route of the worker that connects to the respective Tomcat. The name of the session cookie used by Tomcat (and more generally by Java web applications based on servlets) is
JSESSIONID (upper case) but can be configured to something else.
The second way of implementing stickyness is URL encoding. The web server searches for a query parameter in the URL of the request. The name of the parameter is specified again using stickysession. The value of the parameter is used to lookup a member worker with route equal to that value. Since it is not easy to extract and manipulate all URL links contained in responses, generally the work of adding the parameters to each link is done by the back-end generating the content. In some cases it might be feasible doing this via the web server using
mod_sed. This can have negative impact on performance though.
The Java standards implement URL encoding slightly different. They use a path info appended to the URL using a semicolon (
;) as the separator and add the session id behind. As in the cookie case, Apache Tomcat can include the configured
jvmRoute in this path info. To let Apache find this sort of path info, you need to set
Finally you can support cookies and URL encoding at the same time, by configuring the name of the cookie and the name of the URL parameter separated by a vertical bar (
) as in the following example:
Proxy Http Apache Server
If the cookie and the request parameter both provide routing information for the same request, the information from the request parameter is used.