\begin{document}
\title{Citi Bike Trips Distance Analysis for Men and Women~~~~}
\author[1]{Ruben Hambardzumyan}%
\affil[1]{NYU Center for Urban Science \& Progress}%
\vspace{-1em}
\date{\today}
\begingroup
\let\center\flushleft
\let\endcenter\endflushleft
\maketitle
\endgroup
\selectlanguage{english}
\begin{abstract}
Citi Bike is one of the most commonly preferred options for public
transportation in New York. As such, it continuously collects millions
of rows of data about its customers and subscribers, that contain
significant insights reflecting various behavioural patterns of Citi
Bike's users. Particularly, considering that men in average are
physically more developed, do they cycle longer distances than women? In
this article, a research of the platform's data has been conducted to
discuss the issue and the results have been discussed.%
\end{abstract}%
\sloppy
\subsection*{Introduction}
{\label{466701}}
Launched in May 2013, Citi Bike quickly became one of the most favorite
means of transportation for people in New York. Nowadays, the platform
utilizes more than ten thousand bikes and manages over fifty thousand
rides per day adding millions of rows of data every month. Such amounts
of data is interesting in terms of statistical analyses to determine
behavioral patterns of the platform's users and the context behind the
data. Gender-based analyses are interesting in terms of determining the
factors that may affect any differences in between the usage patterns of
males and females. For this research, the idea was: do men cycle longer
distances than women? To try to analyse the idea, the author developed
the following null hypothesis: the average trip distance cycled by women
in the month of October, 2016 is the same or less than that of men.
\subsection*{Data}
{\label{429878}}
To tackle the issue, the Citi Bike usage data for the month of October
2016 was parsed from the platform's website. The author chose October
based on the premise that during that month the climate is mind tochoose
cycling for commuting or traveling to a park.the following data was
needed:
* Start station latitude.
* Start station longitude.
* End station latitude.
* End station longitude.
* User type.
* Gender.
* Travel distance1.
Out of the mentioned data, only travel distance needs to be calculated
by using the given geographicalcoordinates (described in the methodology
section). Note, that for the research only Subscribers of Citi Bikewere
used. The figure below shows the distributions of calculated distances
and observed frequences for menand women.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=1.00\columnwidth]{figures/download/download}
\caption{{The figure shows the distances traveled by men and women and the
frequencies of observed distances. It is clearly visible from the
histograms that men travel longer distances.
{\label{166722}}%
}}
\end{center}
\end{figure}
\subsection*{Methodology}
{\label{408121}}
To reject the null hypothesis, the Student's t test to compare the means
of two groups of the same population ~was used. The traveled distances
was calculated by writing a function in Python that gets the
geographical coordinates as values and, taking into account the
spherical shape of the Earth, calculates the distance in kilometers in
between the two points as an arc.~
\subsection*{Conclusions~}
{\label{198023}}
Student's t test to compare the means of two groups of the same
population has given 0.118 as the p-value. Since our significance
threshold for the research was specified as 0.05 (95\% confidence
intervals), we conclude that the null hypothesis holds, meaning that men
did take longer rides than women during the month of October 2016.
