Menu

Home

Using Requests Session Objects for web scraping

kinow @ Nov 25, 2016 01:24:03

I had to write a Python script some months ago, that would retrieve Solar energy data from a web site. It was basically a handful of HTTP calls, parse the response that was mainly in JSON, and store the results as JSON and CSV for processing later.

Since it was such a small task, I used the Requests module instead of a complete web scraper. The HTTP requests had to be made with a time out, and also pass certain headers. I started customizing each call, until I learned about the Requests Session Objects.

You create a session, as in ORM/JPA, where you can define a context, with certain properties and control an orthogonal behavior.

import requests

def get(session, url):
    """Utility method to HTTP GET a URL"""
    response = session.get(url, timeout=None)
    return response

def post(session, url, data):
    """Utility method to HTTP POST a URL with data parameters"""
    response = session.post(url, data=data, timeout=None)
    if response.status_code != requests.codes.ok:
        response.raise_for_status()
    return response

with requests.Session() as s:
    # x-test header will be sent with every request
    s.headers.update({'x-test': 'true'})

    data = {'user': user, 'password': password, 'rememberne': 'false'}
    r = post(s, 'https://portal.login.url', data)

    r = get(s, 'https://portal.home.url')

Besides the session object, that gives you ability to add headers to all requests, you won’t have to worry about redirects. The library by default takes care of that with a default limit of up to 30.

Happy hacking!

Changing Spring Boot environment variables in the command line

kinow @ Nov 21, 2016 21:26:03

This week while helping developers and testers to experiment with a backend application, some of them found useful to learn a simple trick to change Spring Boot properties when you can run the application locally (our testers build, compile, change the code, how cool is that?).

Here’s how it works. Say you have the following settings in your application’s application.properties:

my.application.database.username=sa
my.application.database.password=notasimplepassword

And that you want to change these parameters in order to, for instance, create an application error, so that you can code and test what happens to the frontend application in that situation.

You replace dots by underscores, and put all your words in upper case. So the variables above would be: MY_APPLICATION_DATABASE_USERNAME and MY_APPLICATION_DATABASE_PASSWORD.

Furthermore, you do not need to edit your application.properties file, if you are on Linux or Mac OS. You can start the application and override environment variables at the same time with the following syntax.

$ MY_APPLICATION_DATABASE_USERNAME=olivei MY_APPLICATION_DATABASE_PASSWORD=7655432222a mvn clean spring-boot:run

This way your application will start with the new values.

Happy hacking!

—EDIT—

As pointed by Stéphane Nicoll (thanks!), you could change the property values without having to use the upper case syntax.

mvn -Dmy.application.database.username=anotheruser clean spring-boot:run

And he even included a link to docs! ♥ the Internet and Open Source!

Add a header to a file with Shell script (sed)

kinow @ Nov 12, 2016 01:55:03

Today I was re-generating the documentation for a REST API written in PHP, with Laravel. To generate the documentation, one would have to call a Laravel command first. That command would create a Markdown page. And since in this project I am using Jekyll for the project site, the final step was adding a header to the file, so that Jekyll can recognize that content as a blog post.

Laravel allows you to add custom commands to your project, so I decided to write a command that would call the other command that generates the documentation, and add an extra step of adding the header to the Markdown file.

Here’s the shell script part, that allows you to add a header to a file, in place (i.e. it will alter and save the change your file).

sed -i 1i"----\nlayout: page\ntitle: API Installation\n----\n\n" ./docs/documentation/api/api.md

Here the first argument to sed, -i is for in place. Then that strange 1i means that it will insert something before the first line, once. Then we have our header, and finally the file.

Happy hacking!

Content negotiation with Spring Boot and React

kinow @ Nov 07, 2016 20:07:03

A few days ago I had a bug in a system built with Spring Boot and React. The frontend application was using a REST client in React, built in a similar way to what is found in the documentation, and also in blogs.

import rest from 'rest';
const Rest = () => rest.wrap(mime);

However, for one of the Spring Boot application endpoints, the React component was not working. The response seemed to be OK in the Network tab, of the browser developer tools. But the component was failing and complaining when parsing the response.

Turns out that the frontend was sending the request with the header Accept: text/plain, application/json. And Spring Boot was just using its default content negotiation and returning what the frontend requested: a text plain version of, what looked like, JSON.

The quick fix was to request the content as JSON in React.

import rest from 'rest';
import mime from 'rest/interceptor/mime';
const Rest = () => rest.wrap(mime , { mime: 'application/json' } );

Now we will revist the backend to return the JSON content, as content, regardless of what the user asks :-)

Happy hacking!

Checking the operating system type in shell script

kinow @ Nov 05, 2016 23:27:03

Last week I learned about a tool called ShellCheck, a tool for static analysis of shell scripts. It reports errors like missing double quotes, use of deprecated syntax, etc.

I decided to check some projects I contribute to, and the first issue I found was in Apache Jena:

kinow@localhost:~/Development/java/jena/jena/apache-jena/bin$ shellcheck arq

In arq line 8:
    case "$OSTYPE" in
          ^-- SC2039: In POSIX sh, OSTYPE is not supported.

So, in summary, the OSTYPE variable should not be available in POSIX shell. The case in question, where OSTYPE is being used, checks for the Darwin OS type (i.e. Mac OS). Knowing how things get weird when you use different operating systems, I decided to check and learn how OSTYPE works. Here’s what I found.

  • In Ubuntu, with /bin/bash, OSTYPE works fine (linux-gnu)
  • In Ubuntu, with /bin/sh, OSTYPE is not set
  • In Mac OS, with /bin/sh, OSTYPE is set (darwin15)

I checked the shells to make sure they were not pointing to symbolic links - some distributions use a different default shell, and replace /bin/bash and/or /bin/sh by a link to another shell. Looks like Mac OS has a POSIX shell that behaves different than Ubuntu’s.

Instead of trying to find a way to use OSTYPE, I decided to spend some time looking at how other projects do the same thing. And the best example I could find was git.

Instead of relying on OSTYPE, git uses uname.

I will spend some time during the next days working on a proposal to replace the OSTYPE from Apache Jena scripts, but then may have to submit more changes for the other issues found by ShellCheck.

Happy hacking!