Menu

Posts tagged with ‘python’

Using Requests Session Objects for web scraping

kinow @ Nov 25, 2016 01:24:03

I had to write a Python script some months ago, that would retrieve Solar energy data from a web site. It was basically a handful of HTTP calls, parse the response that was mainly in JSON, and store the results as JSON and CSV for processing later.

Since it was such a small task, I used the Requests module instead of a complete web scraper. The HTTP requests had to be made with a time out, and also pass certain headers. I started customizing each call, until I learned about the Requests Session Objects.

You create a session, as in ORM/JPA, where you can define a context, with certain properties and control an orthogonal behavior.

import requests

def get(session, url):
    """Utility method to HTTP GET a URL"""
    response = session.get(url, timeout=None)
    return response

def post(session, url, data):
    """Utility method to HTTP POST a URL with data parameters"""
    response = session.post(url, data=data, timeout=None)
    if response.status_code != requests.codes.ok:
        response.raise_for_status()
    return response

with requests.Session() as s:
    # x-test header will be sent with every request
    s.headers.update({'x-test': 'true'})

    data = {'user': user, 'password': password, 'rememberne': 'false'}
    r = post(s, 'https://portal.login.url', data)

    r = get(s, 'https://portal.home.url')

Besides the session object, that gives you ability to add headers to all requests, you won’t have to worry about redirects. The library by default takes care of that with a default limit of up to 30.

Happy hacking!

<p class="postmetadata">

    <small>tags 
     [&nbsp;<a
        href="/tag/python"
        rel="tag">python</a>&nbsp;]&nbsp;

    </small>


    <br/>
    <small>posted in 
    [&nbsp;<a
        href="/blog"
        title="View all posts in blog"
        rel="category tag">blog</a>&nbsp;]&nbsp;
    category </small>

</p>

Using the AWS API with Python

kinow @ Oct 04, 2016 21:15:03

Amazon Web Services provides a series of cloud services. When you access the web interface, most - if not all - of the actions you do there are actually translated into API calls.

They also provide SDK’s in several programming languages. With these SDK’s you are able to call the same API used by the web interface. Python has boto (or boto3) which lets you to automate several tasks in AWS.

But before you start using the API, you will need to set up your access key.

It is likely that with time you will have different roles, and may have different permissions with each role. You have to configure your local environment so that you can either use the command line Python utility (installed via pip install awscli) or with boto.

The awscli is a dependency for using boto3. After you install it, you need to run aws configure. It will create the ~/.aws/config and ~/.aws/credentials files. You can tweak these files to support multiple roles.

I followed the tutorials, but got all sorts of different issues. Then after debugging some locally installed dependencies, in special awscli files, I found that the following settings work for my environment.

# File: config
[profile default]
region = ap-southeast-2

[profile profile1]
region = ap-southeast-2
source_profile = default
role_arn = arn:aws:iam::123:role/Developer

[profile profile2]
region = ap-southeast-2
source_profile = default
role_arn = arn:aws:iam::456:role/Sysops
mfa_serial = arn:aws:iam::789:mfa/user@domain.blabla

and

# File: credentials
[default]
aws_access_key_id = YOU_KEY_ID
aws_secret_access_key = YOUR_SECRET

And once it is done you can, for example, confirm it is working with some S3 commands in Python.

#!/usr/bin/env python3

import os
import boto3

session = boto3.Session(profile_name='profile2')
s3 = session.resource('s3')

found = False

name = 'mysuperduperbucket'

for bucket in s3.buckets.all():
    if bucket.name == name:
        found = True

if not found:
    print("Creating bucket...")
    s3.create_bucket(Bucket=name)

file_location = os.path.dirname(os.path.realpath(__file__)) + os.path.sep + 'samplefile.txt'
s3.meta.client.upload_file(Filename=file_location, Bucket=name, Key='book.txt')

The AWS files in this example are using MFA too, the multi-factor authentication. So the first time you run this code you may be asked to generate a token, which will be cached for a short time.

That’s it for today.

Happy hacking!

<p class="postmetadata">

    <small>tags 
     [&nbsp;<a
        href="/tag/python"
        rel="tag">python</a>&nbsp;]&nbsp;

     [&nbsp;<a
        href="/tag/aws"
        rel="tag">aws</a>&nbsp;]&nbsp;

    </small>


    <br/>
    <small>posted in 
    [&nbsp;<a
        href="/blog"
        title="View all posts in blog"
        rel="category tag">blog</a>&nbsp;]&nbsp;
    category </small>

</p>

Processing Vaisala Radiosonde data with Python, and creating GRUAN-like NetCDF files

kinow @ Jul 12, 2016 21:18:03

One of my last projects involved parsing a few GB’s of data that was in a certain binary format, and convert it into NetCDF files. In this post I will describe what was done, what I learned about radiosonde, GRUAN and other geek stuff. Ready?

Vaisala Radiosonde data

When I was told the data was in some binary format, I thought it would be similar to parsing a flat file. That this would contain a fixed length entry, with the same kind of item repeated multiple times.

Well, not exactly.

The files had been generated by an instrument made by Vaisala, a Finnish company. This instrument is called a radiosonde. It is an instrument about the size of an old mobile phone, that is launched with a balloon into the atmosphere.

I was lucky to be given the chance to release one of these balloons carrying a newer version of this equipment.

Radiosonde balloon launch

The balloon can carry equipments for measuring different things, like air pressure, altitude, temperature, latitude, longitude, relative humidity, among others. Equipments like the radiosonde send the data back to a ground-level station via radio, normally in a short and constant interval.

( Read more … )

Too many SQL variables exception in Graphite with SQLite3

kinow @ Jul 04, 2013 13:54:53

Having run Graphite for a while, today I found a rather annoying issue. We were using events, and everything was working perfectly fine so far. But for the 24 last hours, the graph was blank.

Actually, in the dashboard, the graph was missing, being displayed as a gray box. Enabling the web inspector in Google Chrome I got the graph URL. Opening the link in a new tab gave me the exception message: too many SQL variables (1).

After some research, I found out this was a bug in SQLite. After trying to hack the code, and being concerned about using a patched version of Graphite and having to update it later, I decided to switch database.

But to avoid losing the graphs, users and other settings, including the events, I migrated the SQLite database to a MySQL server. This MySQL server was already installed in the server machine, since this machine hosted a Zabbix server too.

Here are the steps required to migrate your database from SQLite to MySQL (2).

  • Download sqlite3_mysql python script from http://www.redmine.org/boards/2/topics/12793
  • Stop Apache/Nginx
  • mysql -u user -p -e “create database redmine graphite set utf8;”
  • sqlite3 graphite.db .dump | sqlite3-to-mysql.py | mysql -uroot -pyourpass graphite
  • Start Apache again

After tail’ing the Graphite webapp log file,

tail f- storage/log/webapp/error.log

I noticed the Python MySQLdb wasn’t installed.

ImproperlyConfigured: Error loading MySQLdb module: No module named MySQLdb

My server was an Ubuntu 13.04, so I installed the module simply with the following command.

apt-get install python-mysqldb

Hope that helps!

1 You may have to enable *DEBUG* in your Django settings for seeing the exception in your browser

2 Zabbix needed some minor tweaks in order to use [MariaDB](https://mariadb.org/), but probably you can give it a try too

Graphite: Broken images

kinow @ Apr 17, 2013 11:36:38

This morning I was setting up a Graphite server to collect metrics with statsd, LogStash and jmxtrans. After following the instructions from @jgeurst, I’ve successfully installed Graphite.

I had previously installed another test box, so I decided to take a deeper look at the settings, write a puppet manifest and prepare this new box to become a production server. However, after browsing the webapp, all graphs were broken.

After googling a while, reading forums and bugs, I decided to open the $GRAPHITE_HOME/webapp/graphite/render/views.py, adding log.rendering(…) statements (not the most elegant solution, I know).

By following the program workflow I found out it was entering a block after checking if it should remotely render the image. This feature is turned on/off by REMOTE_RENDERING = True/False, in local_settings.py.

After setting this to False the problem was solved.