Project Headache #194

shansfolder · 2018-02-21T10:06:38Z

Please don't merge yet. Let's merge this branch until all related refactoring is done and tested.

Results data structure

coveralls · 2018-02-21T10:14:08Z

Coverage decreased (-1.1%) to 91.058% when pulling fd60447 on headache into a2cd16c on master.

Make result classes JSON serializable

Adapt experiment module

gbordyugov · 2018-02-26T09:59:46Z

expan/core/experiment.py

+
+        if 'entity' not in self.data.columns():
+            raise RuntimeError("There is no 'entity' column in the data.")
+        if self.data.entity.duplicated().any():


I think that it used to be a concern of Expan to filter out those duplicates. Am I wrong?

yes it is used to filter out duplicates. Should we still do the same here?

I think that it's quite often the case that data has duplicates — I think we shouldn't break things that we don't have to break and keep at least a resemblance of backward compatibility in terms of functionality.

oh I think I misunderstood.

Here entity should not have duplicates because we do our analysis based on this entity level. If there is duplicates, something is wrong when we group the data. And I think we should raise an error if this happens.

gbordyugov · 2018-02-26T10:00:45Z

expan/core/experiment.py

+
+        if test.variants.variant_column_name not in self.data.columns():
+            raise RuntimeError("There is no '{}' column in the data.".format(test.variants.variant_column_name))
+        if test.variants.treatment_name not in np.unique(self.data[test.variants.variant_column_name]):


Why not use set() instead of np.unique()? Is the latter substantially faster than the former?

Not sure. I think so...I can take a look.

With the amount of data we have, I feel the runtime is similar. I can't feel the difference.

But I think np.unique is better for our pd.Series because it works on np array level.

@gbordyugov @daryadedik Actually we have to use pd.unique() instead of set() on pandas DataFrame.

I just run the method on test data with 8M rows.
set() hangs forever without any error message (it's also the bug Ievgen discovered in Expan Service).
np.unique() takes 10+ seconds.
pd.unique() takes 0.3 seconds.

gbordyugov · 2018-02-26T10:02:47Z

expan/core/experiment.py

+        if test.variants.control_name not in np.unique(self.data[test.variants.variant_column_name]):
+            raise RuntimeError("There is no control with the name '{}' in the data.".format(test.variants.control_name))
+
+        if not isinstance(test.features, list):


This should be checked in the constructor of 'StatisticalTest` and not here, I believe.

good point. I will fix it.

gbordyugov · 2018-02-26T10:02:50Z

expan/core/experiment.py

+
+        if not isinstance(test.features, list):
+            raise TypeError("Features should be a list.")
+        if not all(isinstance(n, FeatureFilter) for n in test.features):


This check should also go into the constructor of the test, if I'm not mistaken.

good point. I will fix it.

gbordyugov · 2018-02-26T10:05:00Z

tests/tests_core/test_statistical_test.py

+        numerator = "normal_same"
+        denominator = "normal_shifted"
+        derived_kpi_name = "derived_kpi_one"
+        DerivedKPI(derived_kpi_name, numerator, denominator).make_derived_kpi(self.data)


make_derived_kpi() should be ensured to perform its deeds only once, either here or within the DerivedKPI.make_derived_kpi() method

good point. Fixed.

Multiple correction method module

…tation

# Conflicts: # expan/core/experiment.py

# Conflicts: # CHANGELOG.md # CHANGELOG.rst

# Conflicts: # expan/core/experiment.py

Finish Documentation

shansfolder · 2018-03-15T17:48:42Z

@gbordyugov you can still review it in the history of merge pull request, but I am afraid its too big to review anyways. Maybe checkout the code directly is easier.

I am gonna merge it now, without releasing. So we can still discuss when you're back.

ddedik and others added 7 commits February 19, 2018 17:35

first commit to new statistical test structure

3728e6b

small fix - newline

45a1dbc

Add formula field of derived kpi

d68a897

Add docstring + small update

67eaa05

Add check for input type

9e986ee

Add KPI class

99ebea3

Merge pull request #193 from zalando/results_data_structure

b77b6ce

Results data structure

shansfolder and others added 17 commits February 21, 2018 12:37

Make result classes json serializable

179723b

Fix double confidence interval fields

5470c64

Merge pull request #195 from zalando/json_serializable

90d58ab

Make result classes JSON serializable

First steps toward the biggest refactoring

625a21b

refactoring experiment, delta, features checks

53d3a52

Another round of implementation

cc04de8

One more round of implementation

e91b2c8

Improve docstrings

4f1254b

fix typos in wordings

3f4e0ab

simplified formula for derived kpi

cb86b8c

methods apply to data for KPI and FeatureFilter

a4e514e

fix naming

c8d0cd7

Small update

9953c9a

Small update

79cdafe

Fix get_variant method

39c4c68

added checks for get_variant result and fix for data subset

febcbaf

Merge pull request #196 from zalando/adapt_experiment_module

63e2d6d

Adapt experiment module

gbordyugov reviewed Feb 26, 2018

View reviewed changes

shansfolder and others added 28 commits March 6, 2018 16:23

Small fix

0a4c983

adapted unit tests for experiment module

f0c06c0

fixed csv_fetcher class, fixed unit tests

bc8ebd5

added helped methods unit tests

fe1590a

added unit tests for the reweigtening trick

e2abd26

adapted unit tests for re-weighting trick

a3dc800

refactored util for test_core, added docstring

85392ec

Intermediate step of test_experiment

f78116b

Another intermediate step

d0a94fa

Finish test_experiment

72dfe3f

Merge pull request #201 from zalando/correction_module

2d5becf

Multiple correction method module

added logger and replaced assertNumericalEqual with assertAlmostEqual

d7a01bf

added docstrings where missing

5d80948

removed a quote, minor

296ed7a

Improve docstring in sphinx

24ef447

Improve and fix docstrings

85defea

Improve and fix docstrings

3accd73

Merge branch 'documentation' of github.com:zalando/expan into documen…

cefa11c

…tation

Update tutorial.rst (half way done)

06a2dc5

Update tutorial.rst further

5fa1a7b

Merge remote-tracking branch 'origin/master' into headache

406c39b

# Conflicts: # expan/core/experiment.py

Merge remote-tracking branch 'origin/headache' into documentation

4021929

# Conflicts: # CHANGELOG.md # CHANGELOG.rst

Update changelog

3a78229

Update contributing page

38382e7

Finish doc for the new version

c649c8d

Merge remote-tracking branch 'origin/master' into headache

9c15a89

# Conflicts: # expan/core/experiment.py

Merge remote-tracking branch 'origin/headache' into documentation

823506e

Merge pull request #204 from zalando/documentation

fd60447

Finish Documentation

shansfolder merged commit 648a490 into master Mar 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Headache #194

Project Headache #194

shansfolder commented Feb 21, 2018

coveralls commented Feb 21, 2018 •

edited

Loading

gbordyugov Feb 26, 2018

shansfolder Feb 26, 2018

gbordyugov Feb 26, 2018

shansfolder Feb 26, 2018

gbordyugov Feb 26, 2018

shansfolder Feb 26, 2018

shansfolder Feb 26, 2018

shansfolder Mar 5, 2018

gbordyugov Feb 26, 2018

shansfolder Feb 26, 2018

gbordyugov Feb 26, 2018

shansfolder Feb 26, 2018

gbordyugov Feb 26, 2018

shansfolder Feb 26, 2018 •

edited

Loading

shansfolder commented Mar 15, 2018

Project Headache #194

Project Headache #194

Conversation

shansfolder commented Feb 21, 2018

coveralls commented Feb 21, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shansfolder Feb 26, 2018 • edited Loading

Choose a reason for hiding this comment

shansfolder commented Mar 15, 2018

coveralls commented Feb 21, 2018 •

edited

Loading

shansfolder Feb 26, 2018 •

edited

Loading