Relative links in Webserver DAT

I can make the Webserver DAT serve a simple webpage (including images, css, and JS) with onHTTPRequest as follows:

import os
from urllib.parse import urlparse

web_root = "WebRoot" # location of HTML files relative to .toe

def onHTTPRequest(webServerDAT, request, response):
       #convert request[uri] into local file path
	url_parsed = urlparse(request['uri'])
	path = url_parsed.path.lstrip('/').replace("/", os.path.sep) 
	path = os.path.join(web_root, path)
		
	response['statusCode'] = 200 
	response['statusReason'] = 'OK'

	try:
		with open(path, mode='r', encoding='utf-8') as file:
			response['data'] = file.read()
	except UnicodeDecodeError:
		with open(path, mode='rb') as file: 
			response['data'] = file.read()
	
	return response

(this is simplified for the sake of the forum. full version here)

But letā€™s say Iā€™m loading http://localhost:9980/foo/index.html from my trusty local browser.

Here is [WebRoot]/foo/index.html

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8" />
    <link rel="stylesheet" href="style.css" />
  </head>
  <body>
    Hello, World!
  </body>
</html>

In the first request that hits onHTTPRequest, request[ā€˜uriā€™] is /foo/index.html. Everything works great.

But the second request that hits onHTTPRequest is for /style.css, and in this example, that is incorrect. It should request /foo/style.css because there is no leading slash in the tag in the HTML.

Iā€™ve examined the request dict that onHTTPRequest receives, and I canā€™t find anything that would allow me to fix this in Python. I suppose, at the very least, Iā€™d need a way to determine that the request is relative (not provided), and the Referrer (is provided).

Of course the short-term fix is just to make all of my HTML links absolute paths. But it just seems like this is something that the Webserver DAT should be able to handle.

Am I missing something here?

Hi @jeffcrouse,

i kind of rebuild your setup and in the onHTTPRequest just check for the request['uri'] to be ā€œ/foo/index.htmlā€ at which time I respond with your html code.

This then comes back with a request for ā€œā€˜/foo/style.cssā€™ā€ - so i believe as expected?

the debug of the request is:

{   'method': 'GET',
    'uri': '/foo/index.html',
    'pars': {},
    'clientAddress': '127.0.0.1:55269',
    'serverAddress': '127.0.0.1:9980',
    'sec-ch-ua': '"Not/A)Brand";v="8", "Chromium";v="126", "Google '
                 'Chrome";v="126"',
    'Host': 'localhost:9980',
    'DNT': '1',
    'Connection': 'keep-alive',
    'Sec-Fetch-Mode': 'navigate',
    'Upgrade-Insecure-Requests': '1',
    'sec-ch-ua-platform': '"Windows"',
    'Cache-Control': 'max-age=0',
    'sec-ch-ua-mobile': '?0',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                  'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 '
                  'Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',
    'Sec-Fetch-Dest': 'document',
    'Accept-Encoding': 'gzip, deflate, br, zstd',
    'Accept-Language': 'en-US,en;q=0.9,de;q=0.8',
    'data': b''} 
  (Debug - DAT:/project1/webserver1_callbacks fn:onHTTPRequest line:17)
{   'method': 'GET',
    'uri': '/foo/style.css',
    'pars': {},
    'clientAddress': '127.0.0.1:55269',
    'serverAddress': '127.0.0.1:9980',
    'sec-ch-ua': '"Not/A)Brand";v="8", "Chromium";v="126", "Google '
                 'Chrome";v="126"',
    'Host': 'localhost:9980',
    'Accept': 'text/css,*/*;q=0.1',
    'DNT': '1',
    'Connection': 'keep-alive',
    'Sec-Fetch-Mode': 'no-cors',
    'sec-ch-ua-mobile': '?0',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                  'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 '
                  'Safari/537.36',
    'sec-ch-ua-platform': '"Windows"',
    'Sec-Fetch-Site': 'same-origin',
    'Sec-Fetch-Dest': 'style',
    'Referer': 'http://localhost:9980/foo/index.html',
    'Accept-Encoding': 'gzip, deflate, br, zstd',
    'Accept-Language': 'en-US,en;q=0.9,de;q=0.8',
    'data': b''} 
  (Debug - DAT:/project1/webserver1_callbacks fn:onHTTPRequest line:17)
{   'method': 'GET',
    'uri': '/favicon.ico',
    'pars': {},
    'clientAddress': '127.0.0.1:55269',
    'serverAddress': '127.0.0.1:9980',
    'sec-ch-ua': '"Not/A)Brand";v="8", "Chromium";v="126", "Google '
                 'Chrome";v="126"',
    'Host': 'localhost:9980',
    'Accept': 'image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8',
    'DNT': '1',
    'Connection': 'keep-alive',
    'Sec-Fetch-Mode': 'no-cors',
    'sec-ch-ua-mobile': '?0',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                  'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 '
                  'Safari/537.36',
    'sec-ch-ua-platform': '"Windows"',
    'Sec-Fetch-Site': 'same-origin',
    'Sec-Fetch-Dest': 'image',
    'Referer': 'http://localhost:9980/foo/index.html',
    'Accept-Encoding': 'gzip, deflate, br, zstd',
    'Accept-Language': 'en-US,en;q=0.9,de;q=0.8',
    'data': b''} 
  (Debug - DAT:/project1/webserver1_callbacks fn:onHTTPRequest line:17)

Generally I would believe this to be the job of the browser to request the right uri?

cheers
Markus

Hey @snaut ā€“ thanks for the reply and for taking the time to try out my code.

I still donā€™t know what is wrong in my production code, because if I run python -m http.server 8000 from my WebRoot directory, everything works as expected. So that was leading me to believe that there was some rewriting of the request URI going on in TDā€¦

However, when I made a new TD project from scratch to try to illustrate my point, everything worked fine. So, it must be something else causing the problem in my production code. If I figure it out, Iā€™ll come back here and explain. In the meantime, hereā€™s the (now useless) example code in case anyone else finds it helpful.
rel_link_test.zip (6.1 KB)

thanks again!

Actually, I still think there is a small issue, and you can replicate it with the following steps:

  1. download and unzip rel_link_test.zip (6.1 KB)
  2. Navigate the WebRoot folder and run python -m http.server 8000
  3. Go to http://localhost:8000/foo in your browser
  4. Notice that the text is red, which means that it successfully loaded foo/style.css
  5. Now launch rel_link_test.toe
  6. Go to http://localhost:9980/foo in your browser
  7. Notice that the text is not red, meaning that foo/style.css was not loaded
  8. Go to http://localhost:9980/foo/index.html in your browser
  9. Notice that the text is red, meaning that foo/style.css was loaded

So the relative path issue occurs when the onHTTPRequest receives a directory path instead of a file path.

Yes, it is the browserā€™s responsibility to request the right URI, but itā€™s the serverā€™s responsibility to decide what to do when the user requests a directory. In NGINX you configure this with index, in Apache itā€™s DirectoryIndex, etc. In TD, it should be the responsibility of whoever is writing the onHTTPRequest callback to specify this behavior. But it seems to be impossible to handle correctly when it comes to requests realtive to the index. Iā€™m not sure there is enough information in the request dict to handle relative links properly.

In the meantime, for my own purposes, Iā€™ve decided to change this:

if os.path.isdir(path):
    path = os.path.join(path, "index.html")

to this

if os.path.isdir(path):
    response['statusCode'] = 302
    response['statusReason'] = 'Found'
    response['Location'] = url_parsed.path + "/index.html"
    return response

This way, we start over with the full request.

I realize this is a very minor issue and donā€™t expect a response. But I spent a few hours tracking down this issue, so Iā€™m describing it here in case anyone else encounters this issue.

Hi @jeffcrouse

I found this answer that explains the issue:

The behavior is related to how web browsers resolve relative URLs based on the current path of the document. When you access a URL without a trailing slash (e.g., http://localhost:9980/foo), the browser interprets foo as a resource, not a directory. Thus, any relative paths in the document are resolved relative to the parent directory of foo, which is the root (/). This is why the request for script.css is sent to /style.css instead of /foo/style.css.

Conversely, when you access the URL with a trailing slash (e.g., http://localhost:9980/foo/), the browser interprets foo/ as a directory, and relative paths in the document are resolved accordingly, hence the request for script.css correctly goes to /foo/style.css.

Yeah - returning a redirect to the url with a trailing slash appended might be the best idea in this case. Would be interesting to understand how nginx for example deals with this in their code.

cheers
Markus

Hi @jeffcrouse,

checking in pythonā€™s Lib\http\server.py:

            parts = urllib.parse.urlsplit(self.path)
            if not parts.path.endswith('/'):
                # redirect browser - doing basically what apache does
                self.send_response(HTTPStatus.MOVED_PERMANENTLY)
                new_parts = (parts[0], parts[1], parts[2] + '/',
                             parts[3], parts[4])
                new_url = urllib.parse.urlunsplit(new_parts)
                self.send_header("Location", new_url)
                self.send_header("Content-Length", "0")
                self.end_headers()
                return None

so sending in a url without a slash getā€™s a 301 with a slash added - almost the same what you are doing.

Ha - should have checked the headers earlier :slight_smile: Shows exactly that behaviorā€¦

cheers
Markus

@snaut With the help of your comments and Lib\http\server.py, I doctored up my onHTTPRequest to do it the way that Python does it, i.e.: redirect if the user requests a directory without a trailing slash, and otherwise look for an index.html.

Itā€™s nowhere near as full-featured as Lib\http\server.py, but it at least behaves in a more consistent way.

import os
import mimetypes
import urllib

mimetypes.init()

base = "WebRoot"
# return the response dictionary
def onHTTPRequest(webServerDAT, request, response):
	#debug(request)
	
	url_parsed = urllib.parse.urlparse(request['uri'])
	
	# Convert the path part of the URL into a local filepath
	path = url_parsed.path.lstrip('/')	
	path = path.replace("/", os.path.sep)
	path = os.path.join(base, path)

	# what shall we do if the user request a directory?
	if os.path.isdir(path): 
	
		# borrowed from Lib/http/server.py
		parts = urllib.parse.urlsplit(url_parsed.path)
		
		# Directory paths have to have a trailing slash to handle relative links properly
		# So we will redirect users to the same URL with an added slash if it's missing
		if not parts.path.endswith('/'):
			new_parts = (parts[0], parts[1], parts[2] + '/', parts[3], parts[4])
			new_url = urllib.parse.urlunsplit(new_parts)
			response['statusCode'] = 301 
			response['statusReason'] = 'MOVED PERMANENTLY'
			response['Location'] = new_url
			response["Content-Length"] = "0"
			return response
		# Look for an index in the directory
		for index in "index.html", "index.htm":
			index = os.path.join(path, index)
			if os.path.isfile(index):
				path = index
			break

	if os.path.exists(path):
		response['statusCode'] = 200 
		response['statusReason'] = 'OK'
		
		type = mimetypes.guess_type(request['uri'])
		
		if type[0] is not None:
			response["Content-Type"] = type[0]
			
		try:
			with open(path, mode='r', encoding='utf-8') as file:
				response['data'] = file.read()
		except UnicodeDecodeError:
			with open(path, mode='rb') as file: 
				response['data'] = file.read()
	else:
		response['statusCode'] = 404 
		response['statusReason'] = 'NOT FOUND'
		response['data'] = "File Not Found: " + path
	
	return response

Hope this is useful for someone.

In case anyone is wondering why I would want to host a regular website from TD, I have a set of special URLs that the HTML interface calls when a button is clicked. I intercept these special URLs in order to interact with things in my TD project.

For example, http://localhost:9980/_do?action=start&n=5 would start timer5

	url_parsed = urllib.parse.urlparse(request['uri'])
	if url_parsed.path == "/_do":
		action = request['pars']['action']
		if action == 'start' and 'n' in request['pars']:
			n = int(request['pars']['n'])
			op(f'timer{n}').par.start.pulse()
		response["Content-Type"] = 'application/json'
		response['statusCode'] = 200 
		response['statusReason'] = 'OK'
		response['data'] = json.dumps({"status": "OK"})
		return response

For me, this is better than serving a website separately because there are less moving parts. And, in my experience, WebSocket are great if you need to stream data continuously. But for simple trigger-type events, Iā€™d much rather the simplicity of a single HTTP request.

Thanks again, @snaut

1 Like