Skip to content

Utilites for caching file contents

Functions:

read_file_and_cache

read_file_and_cache(filepath)

Read and cache string contents of files for quick access and reducing IO operations.

Note

May need "forgetting" mechanism if too many large files are stored. Should be fine for loading JSON metamodels and SHACL constraints in Turtle format.

Source code in src/rdf_utils/caching.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def read_file_and_cache(filepath: str) -> str:
    """Read and cache string contents of files for quick access and reducing IO operations.

    Note:
        May need "forgetting" mechanism if too many large files are stored. Should be fine
        for loading JSON metamodels and SHACL constraints in Turtle format.
    """
    if filepath in __FILE_LOADER_CACHE:
        return __FILE_LOADER_CACHE[filepath]

    with open(filepath) as infile:
        file_content = infile.read()

    if isinstance(file_content, bytes):
        file_content = file_content.decode("utf-8")

    __FILE_LOADER_CACHE[filepath] = file_content
    return file_content

read_url_and_cache

read_url_and_cache(url, timeout=_GLOBAL_DEFAULT_TIMEOUT)

Read and cache text responses from URL

Parameters:

  • url (str) –

    URL to be opened with urllib

  • timeout (float, default: _GLOBAL_DEFAULT_TIMEOUT ) –

    duration in seconds to wait for response. Only works for HTTP, HTTPS & FTP. Default: socket._GLOBAL_DEFAULT_TIMEOUT will be used, which usually means no timeout.

Source code in src/rdf_utils/caching.py
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
def read_url_and_cache(url: str, timeout: float = _GLOBAL_DEFAULT_TIMEOUT) -> str:
    """Read and cache text responses from URL

    Parameters:
        url: URL to be opened with urllib
        timeout: duration in seconds to wait for response. Only works for HTTP, HTTPS & FTP.
                 Default: `socket._GLOBAL_DEFAULT_TIMEOUT` will be used,
                 which usually means no timeout.
    """
    if url in __URL_CONTENT_CACHE:
        return __URL_CONTENT_CACHE[url]

    with urllib.request.urlopen(url, timeout=timeout) as f:
        url_content = f.read()

    if isinstance(url_content, bytes):
        url_content = url_content.decode("utf-8")

    __URL_CONTENT_CACHE[url] = url_content
    return url_content