Creating Excel files from CSV with Python

When you need to extract data from a system you can bet that someone will claim they have to have it in a Microsoft Excel format rather than the simple CSV format you had in mind. I’m unlikely to need to tell you this – as you’re already here – but that there are huge and annoying differences between these two formats. However, I’m not going to list them here, instead I want to introduce you to a nice python library: openpyxl ( https://pypi.python.org/pypi/openpyxl ).

This beautiful Excel library is a great example of how a good library should be written – it takes a complex problem and hides it away behind a nicely designed interface, and having written an Excel parser myself I can assure you that was unlikely to be an easy task! Basically, this library lets you read and write Excel files and requires next to no knowledge of the Excel format.

My little piece of code takes a list of CSV files with a name and inserts them all into a new excel file, you’ll get one sheet per CSV. It also improves the display by bolding the first line (assuming these are column titles) and adjusts the width of each column depending on the size of the strings within the column.

You can find the full code here: https://bitbucket.org/akademy/csvtoexcel/src/e92654d86ba35e221332c4365c78662b48f274c0/excel_writer.py?at=master&fileviewer=file-view-default

And you can use it like this:

	ew = ExcelWriter()
	ew.convert( {
		# Specify the csv files, whether it has column titles (default: yes) and the names
		"sheets" : [
			{
				"filelocation" : "test/work.csv",
				"sheetname" : "Works"
			},
			{
				"filelocation" : "test/person.csv",
				"sheetname" : "People",
				"has_titles" : False
			}
		],
		# Specify the output name
		"outputname" : "test/test_outout_file.xlsx"
	}

openpyxl can do much more – pretty much anything you can do in Excel – check out the docs for more info: https://openpyxl.readthedocs.io

So install the library and run the code!

Subversion – folders and projects

This post is about folder structures and project management in Subversion.. If you don’t already know what Subversion is check out the previous post: http://blog.akademy.co.uk-tips/2009/09/source-control-beginning-with-subversion/

Folder Structure

The folder structure you implement in your repository can have significant impacts on how well your project gets managed, now and in the future. With the right forethought you can implement things like:

  • release version tagging,
  • multiple simultaneous version releases
  • and have code, installers and build systems all in the same place.

Tags – release version tagging

Tagging is the term used to “mark” a certain version of your files as important in some way. Most often this will be to mark public releases of your code, this enables you to go back to that specific version even after other changes and version releases have taken place. In Subversion there is no specific tagging mechanism, instead you simply take a copy of the code as it is at the moment. Now this sounds like it would use a whole lot of memory space but you need to remember that Subversion only ever records the changes between files – a tag will not not have any changes in so is virtually copied for free.

To keep track of all your tags you’ll need somewhere to put them and in subversion this means a folder. So in the top level repository folder have a folder called “Tags”. Lets assume you have another folder called “Main” for now which contains your source, to create a tag you’d just copy it, so type something like:

  1. svn copy file://your-path/TestProj/Main file://your-path/TestProj/tags/release-1.0 --message "1.0"

Branches – multiple simultaneous version releases

To work on multiple versions you’ll need to once again copy the main source code. You might need to work on two different releases if you have two major changes to the code which you want to test separately, or you need to make bug fixes in an old release but don’t want to mix up the new code, or you just want to try an idea but don’t want to mess with the main source.

You’ll need to keep this code separate also so create another folder at the top level of your repository. We’ll call this one “Branches” as you are copying the main code to subsequently change it – it branches away from the main part. Now You’ll just need to copy the main code, so type:

  1. svn copy file://your-path/TestProj/Main file://your-path/TestProj/branches/newtest --message "fixing"

This command is near identical to the tag one. This is because all you are “really” doing is making a folder and copying some files to it.

Main (or Trunk) –  code, installers and building

When you read up on subversion you’ll come across the folder “Trunk”. Once again, this is subversion so it’s just a folder, and it’s where you main source is edited from. (It’s called “Trunk”, because it’s where the “Branches” come from!). The above paragraphs have called in “Main”.

I tend to split my main folder into several others to make the full building of a program easier. Typically this will be something like “Source”, “Installers” and “Builder” – “Source” holds things like c++ files, “Installer” will hold the files needed to create an installer, and “Builder” is the files needed to automatically build all the bits into some kind of release, such as a CD; (you might like to add “Content” or “Manuals” or something else to your own). This is very useful when you need to tag something as everything needed to rebuild a specific release will be copied into that tag.

Trunk, Branches, Tags or anything.

Obviously the folders you’ll need will depend on your own project. Don’t be afraid to experiment with your own names and structures – the folders are yours to play with!

A few caveats on changing from “Trunk”, “Branches” and “Tags”

  • Some client programs can automatically create tags and branches if you stick to these folder names.
  • Much of the documentation will make reference to these folder names.

Multiple projects

Up until now I have assumed a single Project in your repository but that’s not the only way to do it. There’s two main ways to store multiple projects in Subversion, each with positives and negatives.

  • Create multiple repositories, one for each project.
  • Create a single repository with a top level folder for each project.

Multiple repositories, single projects

This is the simplest to manage. Every new project his it’s own repository and is entirely separate from any others.

Positives:

  • You can reduce the impact from a single hardware failure as repositories can be kept on separate hard drives.
  • You can backup each project separately, depending on it’s value.
  • You can easily give user access to a single project
  • User subversion errors limited to a single project.

Negatives

  • Users will need to be created separately for each repository.
  • The merging of projects is very difficult.

Single repository, multiple projects

A single repository, with top level folders for each project. Each sub folder has the usual “Trunk”, “Tags” and “Branches” in.

Positives

  • A single list of users is all that is needed.
  • A single, simple, backup procedure.
  • Merging of several projects into one is very easy.

Negatives

  • Can mistakenly give access to a user for any project.
  • Single hardware failure can wipe out all projects at once. Single backups also at risk.
  • A subversion user error in one project can impact all other projects.

Conclusion

There is no right and wrong way to store your folders. It really depends on your use, for instance I use subversion to back up my fiction writing, and it contains no folders just a list of files.

There are some best practices you should consider though, decide how you’ll make taqgs or branches if you’ll need them, and if you are going to use multiple or single project repositories.

However, if you are looking for a quick set up, I’d recommend having a single repository for a project, then a sub folder structure of “Trunk”, “Branches” and “Tags”. In side that Trunk create a folder for each of the different parts you’ll need, although different parts can be added later.

That concludes this blog. A future blog will include subversion clients amongst other things.

Subversion – source control

A good source control system is a must for almost all programming projects. For instance:

  • You can quickly check which lines across multiple files you’ve changed.
  • Revert changes to any previous date.
  • Never lose any file again.
  • Keep track of why a file was changed, by whom and when.
  • Take snap shots of release code versions.
  • Manage multiple engineers working on the same code base.
  • Automatically combine code from multiple engineers.
  • Split code development into various paths then quickly combine at a later date.

Over the years I’ve used many different source control systems – SourceSafe, CVS, Perforce, CodeSafe –  but the one I’m most impressed with is the only one I use at home and the one I encourage at work: Subversion.

Subversion is an open source, source control system (and they “eat their own dog food” – explained). It’s available on Linux, Windows and Mac and you can freely download it from their website.

If you used CVS before then you’ll probably understand how it works, but if you are more of a SourceSafe person (I feel for you) then Subversion takes a little explanation. Subversion works like this:

  1. First you create a “repository”, this will be where all the files are stored, lets call it “TestProj”, you’ll normally create this on your server so that multiple people can access it.
    1. type: svnadmin create your/path/TestProj
  2. Now locate where you want your own working copy to exist, lets call it “MyTestProj” and “Check Out” TestProj, this simply creates all the files in your repositiory (there isn’t any yet) together with some other files in a folder called “.svn” which holds the admin bits that subversion needs (you’ll see one of them in every subfolder you create).
    1. type: svn file:///your/path/TestProj MyTestProj
  3. Now create some files, say “TestProj.cpp”, “TestProj.h” and “TestProj.ico” in MyTestProj folder. These currently exist only on your computer so you need to “Add” them to Subversion. This just means you “are going to” add them, subversion won’t do anything until you subsequently “Commit” these changes, at that point you’ll get the opportunity to comment on your changes.
    1. type: svn add TestProj.cpp TestProj.h TestProj.ico
    2. type: svn commit --message "Adding first files"
  4. Now all your files are commiteed and the files are added inside the repository you created. Anyone else doing “Check Out” will see them in their working folder.

You can stick any folder or file in your working copy, and make any changes to the files you wish, nothing will happen to the Repository until you commit. You can “Update” you copy at any time to get everyone else’s changes or “Commit” you’re own changes back – by default there is no locking of files – if you change the same file as someone else then Subversion automatically merges them together –

…take a deep breath if you are a SourceSafe user, this will sound like witch craft…

– in almost all cases code can be automatically added as different lines will have been changed (this was hard for me to believe back when I first used it but believe me it’s clever enough to never cause a problem). On the rare cases (although this depends on project size, user number and frequency of updates) you’ll get a “Conflict” which means you’ve changed the same line someone else has and you’ll have to sort this out yourself before you are allowed to commit your changes.

There’s more information at:

That ends the first Subversion blog, in the future ones I’ll be looking at a folder structure you can use to make your projects easier to manage (see here), some of the client apps that exist to make working with subversion easier and more efficient, backups and a few more bits and pieces.