.:: adilbenseddik's blog of humor and code ::.

Discussing Julien Phalip approach to searching with Django

Debugging Julien's code from my previous implementation of a search box.

2014-12-06 django, python, search,

Trying to tailor Julien Phalip approach to searching with django to my own needs led me to build a more simple approach and implementation of his code. It also raises questions about possible errors or limits in his method. The 6 years old snippet of code can be found here.

First: updating the code, I wanted to simplify the regex part as I do not need the terms grouping feature (ex. "two words"). Yet, it is useful to clean the search terms so why not using a single regex for that and get rid of his complex function. Here is my single line regex, it only slices the search string and cleans the terms from unwanted spaces and characters.

import re

# Julien's regex function
def normalize_query(query_string,
    findterms=re.compile(r'"([^"]+)"|(\S+)').findall,
    normspace=re.compile(r'\s{2,}').sub):
    return [normspace('',(t[0] or t[1]).strip()) for t in findterms(query_string)]

string = 'foo roo "too yoo"    joo, loo. moo; hoo:'

normalize_query(string)
# OUT: ['foo', 'roo', 'too yoo', 'joo,', 'loo.', 'moo;', 'hoo:']

# My single line expression
re.compile(r'[^\s";,.:]+').findall(string)
# OUT: ['foo', 'roo', 'too', 'yoo', 'joo', 'loo', 'moo', 'hoo']

As you can see it takes only this to obtain a suitable list of terms for searching and you can further refine the regex to your taste.

Second: the get_query function does not work very well when processed with multiple query terms. It raises and AND statement at the SQL level wich exclude some results arbitrarily from the search scope and you don't want that. It should be an OR statement instead. Here is the debugging.

# assuming that the previous function is defined

from django.db.models import Q

def get_query(query_string, search_fields):
    query = None # Query to search for every search term
    terms = normalize_query(query_string)
    for term in terms:
        or_query = None # Query to search for a given term in each field
        for field_name in search_fields:
            q = Q(**{"%s__icontains" % field_name: term})
            if or_query is None:
                or_query = q
            else:
                or_query = or_query | q
        if query is None:
            query = or_query
        else:
            query = query & or_query
    return query

# testing on whatever model
from camping.models import Service

qry = get_query('hello cruel world', ['name','description'])

print qry
(AND: (OR: ('name__icontains', 'hello'), ('description__icontains', 'hello')), (OR: ('name__icontains', 'cruel'), ('descript
ion__icontains', 'cruel')), (OR: ('name__icontains', 'world'), ('description__icontains', 'world')))

fe = Service.objects.filter(qry)

fe.query
# OUT: <django.db.models.sql.query.query object="" at="" 0xb6023f0c="">

print fe.query
# OUT: SELECT `camping_service`.`id`, `camping_service`.`owner_id`, `camping_service`.`name`, `camping_service`.`number`, `camping_service`.`price`, `camping_service`.`pax`, `camping_service`.`picture`, `camping_service`.`description` FROM `camping_service` WHERE ((`camping_service`.`name` LIKE %hello% OR `camping_service`.`description` LIKE %hello%) AND (`camping_service`.`name` LIKE %cruel% OR `camping_service`.`description` LIKE %cruel%) AND (`camping_service`.`name` LIKE %world% OR `camping_service`.`description` LIKE %world%))

More or less, looping over the query twice is not working. You can test it yourself and you will see that regardless to the terms grouping feature, some search results won't pop up because they need to apply to the arbitrary AND condition. The result of the print qry command should begin with an OR statement. Try both functions on a model where you have data and compare.

Here is the code for the right query object

def get_query(search_string, fieldnames):
    query = None
    terms = re.compile(r'[^\s";,.:]+').findall(search_string)
    for term in terms:
        for fieldname in fieldnames:
            qry = Q(**{'%s__icontains' % fieldname: term})
            if query is None:
                query = qry
            else:
                query = query | qry
    return query
# testing on the same model
from camping.models import Service

qry = get_query('hello cruel world', ['name','description'])

print qry
# OUT: (OR: ('name__icontains', 'hello'), ('description__icontains', 'hello'), ('name__icontains', 'cruel'), ('description__icontains', 'cruel'), ('name__icontains', 'world'), ('description__icontains', 'world'))

fe = Service.objects.filter(qry)

fe.query
# OUT: <django.db.models.query_utils.q object="" at="" 0xb603b4ec="">

print fe.query
# OUT: SELECT `camping_service`.`id`, `camping_service`.`owner_id`, `camping_service`.`name`, `camping_service`.`number`, `camping_service`.`price`, `camping_service`.`pax`, `camping_service`.`picture`, `camping_service`.`description` FROM `camping_service` WHERE (`camping_service`.`name` LIKE %hello% OR `camping_service`.`description` LIKE %hello% OR `camping_service`.`name` LIKE %cruel% OR `camping_service`.`description` LIKE %cruel% OR `camping_service`.`name` LIKE %world% OR `camping_service`.`description` LIKE %world%)

Now that we get rid of the first function and that we have a working query object we can further this search thing with an even more simple implementation. My suggestion is to use the form.cleaned_data method and django csrf tokens for even more security and data validation

Here is a proposition for a single view implementation.

# views.py
def search(request):
    if request.method == 'POST':
        form = SearchForm(request.POST)
        if form.is_valid():
            string = form.cleaned_data['search']
            terms = re.compile(r'[^\s",;.:]+').findall(string)
            fields = ['field1', 'field1', '...'] # your field names
            query = None
            for term in terms:
                for field in fields:
                    qry = Q(**{'%s__icontains' % field: term})
                    if query is None:
                        query = qry
                    else:
                        query = query | qry
            found_entries = Model.objects.filter(query).order_by('-something') # your model
            return render(request, 'results.html', {'found_entries':found_entries})
    else:
        form = SearchForm()
        return render(request, 'search.html', {'form':form})

# forms.py
class SearchForm(forms.Form):
    search = forms.CharField()# search.html
<form method="POST" action="{% url 'search_for_something' %}">
    {% csrf_token %}
    {{ form.search }}
    <button type="submit">Search</button>
</form>

# results.html
{% if found_entries %}
    {% for entry in found_entries %}
        {{ entry.whatever }} # your model field
    {% endfor %}
{% endif %}# urls.py
url(r'^search/$', search, name='search_for_something'),

And voilà! You really have no excuse to elude the search feature in your project. Thanks to Julien for the inspiration and for his excellent work: now we all have search boxes.

You can compare this implementation with the previous one found in this post and see for yourselves if it worst the changes. At least, it won't break your code if you're working with Django 1.7 or higher.



Freelance, Full-stack

As a child, i learned how to write code because i figured one day this internet thing would help me share my art and knowledge with nice people all over the world. Back then i would draw images on corky image manipulation programs, make bad paintings, memorize defunct 1930s and 1960s architecture books, and take pictures of neighborhood dolls with my mom's camera.
Since then, i've been to some nice schools, learned many computer skills, painted things that aren't strong bloodthirsty barbarians, lived in 3 countries, worked few years for major companies, founded another one to become a photographer, advised one minister at moroccan government, and worked independently for a couple more (and counting)..


twitter/adilbenseddik
github/adilbenseddik


Portfolio available on request:


Code, design and content by
adil.benseddik@mabcs.com
+33(0)611632168
11 Avenue de Taillebourg, 75011 Paris


Copyright ©2014